Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations
Statistical report for microbiome analysis (SGB level, prevalence > 30% in training dataset)
Authors and affiliations
Monika Cahova1,*, Anna Ouradova2,*, Giulio Ferrero3,4,*, Miriam Bratova1, Nikola Daskova1, Klara Dohnalova5, Marie Heczkova1, Karel Chalupsky5, Maria Kralova6,7, Marek Kuzma8, Filip Tichanek1, Lucie Najmanova8, Barbara Pardini10, Helena Pelantová8, Radislav Sedlacek5, Sonia Tarallo9, Petra Videnska10, Jan Gojda2,#, Alessio Naccarati9,#
* These authors have contributed equally to this work and share first authorship
# These authors have contributed equally to this work and share last authorship
1 Institute for Clinical and Experimental Medicine, Prague, Czech Republic
2 Department of Internal Medicine, Kralovske Vinohrady University Hospital and Third Faculty of Medicine, Charles University, Prague, Czech Republic 3 Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
4 Department of Computer Science, University of Turin, Turin, Italy
5 Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic
6 Ambis University, Department of Economics and Management, Prague, Czech Republic
7 Department of Applied Mathematics and Computer Science, Masaryk University, Brno, Czech Republic
8 Institute of Microbiology of the Czech Academy of Sciences, Prague, Czech Republic
9 Italian Institute for Genomic Medicine (IIGM), c/o IRCCS Candiolo, Turin, Italy
10 Mendel University, Department of Chemistry and Biochemistry, Brno, Czech Republic
This is a statistical report of the study Vegan-specific signature implies healthier metabolic profile: findings from diet-related multi-omics observational study based on different European populations that has been submitted to [TO BE ADDED]
When using this code or data, cite the original publication:
TO BE ADDED
BibTex citation for the original publication:
TO BE ADDED
Original GitHub repository: https://github.com/filip-tichanek/ItCzVegans
Statistical reports can be found on the reports hub.
Data analysis is described in detail in the statistical methods report.
1 Introduction
This project explores potential signatures of a vegan diet across the microbiome, metabolome, and lipidome. We used data from healthy vegan and omnivorous human subjects in two countries (Czech Republic and Italy), with subjects grouped by Country and Diet, resulting in four distinct groups.
To assess the generalizability of these findings, we validated our results with an independent cohort from the Czech Republic for external validation.
1.1 Statistical Methods
The statistical modeling approach is described in detail in this report. Briefly, the methods used included:
Multivariate analysis: We conducted multivariate analyses (PERMANOVA, PCA, correlation analyses) to explore the effects of
diet,country, and their possible interaction (diet : country) on the microbiome, lipidome, and metabolome compositions in an integrative manner. This part of the analysis is not available on the GitHub page, but the code will be provided upon request.Linear models: Linear models were applied to estimate the effects of
diet,country, and their interaction (diet:country) on individual lipids, metabolites, bacterial taxa and pathways (“features”). Features that significantly differed between diet groups (based on the estimated average effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.1) were further examined in the independent validation cohort to assess whether these associations were reproducible.Predictive models (elastic net): We employed elastic net (regularized) logistic regression to predict vegan status based on metabolome, lipidome, microbiome and pathways (one predictive model per dataset, i.e.,four elastic net models in total). These models were internally validated using out-of-bag bootstrap resampling. The discriminatory power of each model to differentiate between diet groups was evaluated using the out-of-sample (optimism-corrected) area under the receiver operating characteristic curve (ROC-AUC). The models trained on the training data were then used to estimate the predicted probability that a given subject is vegan in an indepedent validation cohort. This predicted probability was subsequently used as a variable to discriminate between diet groups for external validation.
2 Initiation
2.1 Set home directory
Open code
setwd('/home/ticf/GitRepo/ticf/478_MOCA_italian')2.2 Upload initiation file
Open code
source('478_initiation.R')3 Data
3.1 Upload all original data
3.1.1 Training set
3.1.1.1 Connect metadata from lipidom table
Open code
training_metadata <- read.xlsx('gitignore/data/lipidome_training_cohort.xlsx') %>%
select(Sample, Country, Diet) %>%
mutate(ID = Sample)3.1.1.2 Connect training data
Open code
data_microbiome_original_raw <- read.table(
'gitignore/data/0_Data_Metaphlan4_SGB_subset.txt')
colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]
data_microbiome_original_raw <- data_microbiome_original_raw %>%
t() %>%
data.frame()
colnames(data_microbiome_original_raw) <- data_microbiome_original_raw[1,]
data_microbiome_original_raw <- data_microbiome_original_raw[-1, ] %>%
left_join(training_metadata, by = 'ID') %>%
mutate(Data = 'valid', Sample = ID) %>%
select(Sample, Data, Diet, Country, everything()) %>%
select(-ID)
dim(data_microbiome_original_raw)
## [1] 166 6843.1.2 Validation set
3.1.2.1 Get metadata from lipidom table
Open code
data_lipids_validation <- read.xlsx('gitignore/data/lipidome_validation_cohort.xlsx') %>%
mutate(ID = X1) %>%
select(ID, X2)3.1.2.2 Connect validation data
Open code
data_microbiome_validation_raw <- read.table(
'gitignore/data/0_Data_Metaphlan4_SGB_subset_validation.txt')
colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]
data_microbiome_validation_raw <- data_microbiome_validation_raw %>%
t() %>%
data.frame()
colnames(data_microbiome_validation_raw) <- data_microbiome_validation_raw[1,]
data_microbiome_validation_raw <- data_microbiome_validation_raw[-1, ] %>%
mutate(
ID= paste0("K", gsub("\\..*", "", trimws(ID)))) %>%
left_join(data_lipids_validation, by = 'ID') %>%
mutate(Data = 'valid', Sample = ID, Diet = X2) %>%
select(Sample, Data, Diet, everything()) %>%
select(-ID, -X2)
dim(data_microbiome_validation_raw)
## [1] 103 6833.1.3 Get center-log transformed value
Open code
set.seed(478)
## Training data
metadata <- data_microbiome_original_raw[, c("Sample", "Country", "Diet")]
bacteria_d <- data_microbiome_original_raw[, -c(1:4)] %>%
mutate(across(everything(), as.numeric)) %>%
select(where(~ mean(. != 0) >= 0.3))
rel_taxons <- c(colnames(bacteria_d))
bacteria_data <- bacteria_d / rowSums(bacteria_d)
dim(bacteria_data)
## [1] 166 299
bacteria_data <- lrSVD(bacteria_data,
label = 0,
dl = NULL,
z.warning = 0.9,
z.delete = FALSE,
ncp = 1)
clr_bacteria_data <- clr(bacteria_data)
data_microbiome_original <- cbind(metadata, clr_bacteria_data)
if(file.exists('gitignore/data_microbiome_SGB30_training_impCLR.csv') == FALSE){
write.csv(data_microbiome_original ,
'gitignore/data_microbiome_SGB30_training_impCLR.csv')
}
## Show variances of CLR proportions across samples
data_variance <- data_microbiome_original %>%
rowwise() %>%
mutate(variance = var(c_across(-(Sample:Diet)))) %>%
ungroup() %>%
select(Sample, variance)
## Look at distribution
hist(data_variance$variance)Open code
## Show extreme samples
## Validation data
metadata <- data_microbiome_validation_raw[, c("Sample", "Data", "Diet")]
bacteria_d <- data_microbiome_validation_raw[, -(1:3)] %>%
mutate(across(everything(), as.numeric)) %>%
select(all_of(rel_taxons))
bacteria_data <- bacteria_d / rowSums(bacteria_d)
min(colSums(bacteria_data))
## [1] 0.0004796078
bacteria_data <- lrSVD(bacteria_data,
label = 0,
dl = NULL,
z.warning = 0.9,
z.delete = FALSE,
ncp = 1)
clr_bacteria_data <- clr(bacteria_data)
data_microbiome_validation <- cbind(metadata, clr_bacteria_data)
## Add Diet for K284 which has the diet missing
data_microbiome_validation[
which(data_microbiome_validation$Sample == 'K284'), 'Diet'
] <- 'VEGAN'
data_microbiome_validation$`Wujia_chipingensis|SGB5111`
## [1] -2.2659270 -1.8366784 -1.5719579 -1.5649595 -1.3519655 -1.9273808
## [7] -0.5872619 -1.9902827 -0.9982257 -1.6966210 -2.1930395 -0.1319288
## [13] -2.1260143 -1.9245373 -2.0088636 0.5526297 3.9926563 -0.7907929
## [19] -2.1891711 -1.9395276 -2.1353811 -2.1654279 -2.2360908 -0.3930113
## [25] -2.3957911 -0.2977329 -1.4533578 -1.4072936 -2.0794670 1.8049062
## [31] 1.5258843 3.6617309 3.8680214 -2.0412347 0.4626366 0.3866172
## [37] 5.8044209 -2.2201075 -0.4444399 -2.1377634 -2.1399921 -2.0933531
## [43] 4.5931692 -1.6871195 2.8164681 -2.5644101 -2.0585439 -2.6629664
## [49] 0.6723405 4.4534434 4.1610574 5.3832639 -1.6933290 -2.5382040
## [55] -2.4107280 -2.1198776 5.3413842 -1.8967301 -1.5487724 -2.1396917
## [61] -1.7628668 4.1335815 5.0872100 3.6031291 -1.8040730 4.9799740
## [67] -0.9850500 -1.5123666 -0.5770878 -1.7964678 -2.1354643 2.3798401
## [73] 5.9776544 -2.2009657 -2.4395272 4.0615507 -1.3941153 -2.3882306
## [79] -2.2799115 4.6054080 -1.5516203 -0.4367249 -1.5615449 4.0790976
## [85] 1.0917225 3.7139508 -1.8139512 -1.1749955 -1.6831395 -1.5614355
## [91] -1.9565894 -0.1425388 -1.0454574 -2.1128182 4.9182020 -1.6631032
## [97] -2.3185501 -1.6734295 -1.6233100 -1.5638780 3.3149668 4.7828809
## [103] -1.5505336
# data_microbiome_validation <- data_microbiome_validation %>%
# rename("Wujia_chipingenss|SGB5111" = "Wujia_chipingensis|SGB5111")
if(file.exists('gitignore/data_microbiome_SGB30_validation_impCLR.csv') == FALSE){
write.csv(data_microbiome_validation,
'gitignore/data_microbiome_SGB30_validation_impCLR.csv')
}3.1.4 Merge training and validation dataset
Open code
cbind(
colnames(data_microbiome_original),
colnames(data_microbiome_validation))
## [,1]
## [1,] "Sample"
## [2,] "Country"
## [3,] "Diet"
## [4,] "Bacteroides_stercoris|SGB1830"
## [5,] "Alistipes_putredinis|SGB2318"
## [6,] "Candidatus_Cibionibacter_quicibialis|SGB15286"
## [7,] "Bacteroides_uniformis|SGB1836"
## [8,] "Eubacterium_siraeum|SGB4198"
## [9,] "GGB9602_SGB15031|SGB15031"
## [10,] "Phocaeicola_vulgatus|SGB1814"
## [11,] "Faecalibacterium_prausnitzii|SGB15316"
## [12,] "Lachnospiraceae_bacterium_CLA_AA_H244|SGB4993"
## [13,] "Sutterella_wadsworthensis|SGB9283"
## [14,] "Oscillibacter_sp_ER4|SGB15254"
## [15,] "Parabacteroides_distasonis|SGB1934"
## [16,] "Parabacteroides_merdae|SGB1949"
## [17,] "Oscillibacter_valericigenes|SGB15053"
## [18,] "GGB47687_SGB2286|SGB2286"
## [19,] "Brotolimicola_acetigignens|SGB4914"
## [20,] "GGB9747_SGB15356|SGB15356"
## [21,] "Faecalibacterium_prausnitzii|SGB15339"
## [22,] "Alistipes_shahii|SGB2295"
## [23,] "Alistipes_onderdonkii|SGB2303"
## [24,] "Faecalibacterium_SGB15346|SGB15346"
## [25,] "Dialister_invisus|SGB5825_group"
## [26,] "Faecalibacterium_prausnitzii|SGB15332"
## [27,] "Gemmiger_formicilis|SGB15300"
## [28,] "GGB33469_SGB15236|SGB15236"
## [29,] "Bacteroides_eggerthii|SGB1829"
## [30,] "Coprococcus_eutactus|SGB5117"
## [31,] "Alistipes_communis|SGB2290"
## [32,] "Faecalibacterium_prausnitzii|SGB15322"
## [33,] "Escherichia_coli|SGB10068"
## [34,] "Faecalibacterium_prausnitzii|SGB15318"
## [35,] "GGB3175_SGB4191|SGB4191"
## [36,] "Faecalibacterium_prausnitzii|SGB15342"
## [37,] "Fusicatenibacter_saccharivorans|SGB4874"
## [38,] "Intestinimonas_massiliensis|SGB15127"
## [39,] "Vescimonas_coprocola|SGB15089"
## [40,] "Oscillibacter_sp_MSJ_31|SGB15249"
## [41,] "Bacteroides_caccae|SGB1877"
## [42,] "Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861"
## [43,] "Bifidobacterium_longum|SGB17248"
## [44,] "Alistipes_senegalensis|SGB2296"
## [45,] "Roseburia_intestinalis|SGB4951"
## [46,] "Dysosmobacter_welbionis|SGB15078"
## [47,] "Eubacterium_rectale|SGB4933"
## [48,] "Alistipes_ihumii|SGB2328"
## [49,] "Roseburia_faecis|SGB4925"
## [50,] "GGB9699_SGB15216|SGB15216"
## [51,] "Oscillospiraceae_bacterium|SGB15225"
## [52,] "Faecalibacterium_prausnitzii|SGB15317"
## [53,] "GGB9715_SGB15267|SGB15267"
## [54,] "GGB9667_SGB15164|SGB15164"
## [55,] "Lachnospira_pectinoschiza|SGB5075"
## [56,] "Bacteroides_clarus|SGB1832"
## [57,] "Anaerotignum_faecicola|SGB5190"
## [58,] "Hydrogenoanaerobacterium_saccharovorans|SGB15350"
## [59,] "Bacteroides_faecis|SGB1860"
## [60,] "Roseburia_hominis|SGB4936"
## [61,] "GGB9712_SGB15244|SGB15244"
## [62,] "Ruminococcus_bicirculans|SGB4262"
## [63,] "Bilophila_wadsworthia|SGB15452"
## [64,] "Odoribacter_splanchnicus|SGB1790"
## [65,] "GGB9365_SGB14341|SGB14341"
## [66,] "GGB9715_SGB15265|SGB15265"
## [67,] "GGB9730_SGB15291|SGB15291"
## [68,] "Bifidobacterium_adolescentis|SGB17244"
## [69,] "Bacteroides_ovatus|SGB1871"
## [70,] "GGB9614_SGB15049|SGB15049"
## [71,] "Simiaoa_sunii|SGB4910"
## [72,] "GGB9759_SGB15370|SGB15370"
## [73,] "Barnesiella_intestinihominis|SGB1965"
## [74,] "Agathobaculum_butyriciproducens|SGB14993"
## [75,] "Lachnospira_eligens|SGB5082"
## [76,] "Lacrimispora_amygdalina|SGB4716"
## [77,] "Akkermansia_muciniphila|SGB9226"
## [78,] "Clostridiales_bacterium|SGB15143"
## [79,] "GGB6649_SGB9391|SGB9391"
## [80,] "Ruminococcus_bromii|SGB4285"
## [81,] "Gemmiger_formicilis|SGB15299"
## [82,] "Alistipes_sp_AF17_16|SGB2326"
## [83,] "Clostridiaceae_bacterium|SGB4269"
## [84,] "GGB9760_SGB15373|SGB15373"
## [85,] "Clostridiaceae_bacterium_AF18_31LB|SGB4767"
## [86,] "GGB9621_SGB15073|SGB15073"
## [87,] "Bacteroides_cellulosilyticus|SGB1844"
## [88,] "Faecalibacterium_sp_CLA_AA_H233|SGB15315"
## [89,] "Ruthenibacterium_lactatiformans|SGB15271"
## [90,] "Lawsonibacter_asaccharolyticus|SGB15154"
## [91,] "Clostridium_sp_AF20_17LB|SGB4714"
## [92,] "Phocaeicola_massiliensis|SGB1812"
## [93,] "GGB9760_SGB15374|SGB15374"
## [94,] "Clostridium_fessum|SGB4705"
## [95,] "GGB33586_SGB53517|SGB53517"
## [96,] "Clostridium_sp_AF36_4|SGB4644"
## [97,] "Clostridium_SGB4909|SGB4909"
## [98,] "GGB13404_SGB14252|SGB14252"
## [99,] "GGB9627_SGB15081|SGB15081"
## [100,] "GGB9608_SGB15041|SGB15041"
## [101,] "GGB9707_SGB15229|SGB15229"
## [102,] "Parasutterella_excrementihominis|SGB9262"
## [103,] "Roseburia_inulinivorans|SGB4940"
## [104,] "Phocaeicola_dorei|SGB1815"
## [105,] "Oscillibacter_valericigenes|SGB15124"
## [106,] "GGB9502_SGB14899|SGB14899"
## [107,] "GGB9818_SGB15459|SGB15459"
## [108,] "GGB9767_SGB15385|SGB15385"
## [109,] "Clostridiaceae_bacterium|SGB4770"
## [110,] "Bacteroides_thetaiotaomicron|SGB1861"
## [111,] "GGB9559_SGB14969|SGB14969"
## [112,] "Flavonifractor_plautii|SGB15132"
## [113,] "Fusicatenibacter_sp_CLA_AA_H277|SGB4780"
## [114,] "Lachnospiraceae_bacterium_AM48_27BH|SGB4706"
## [115,] "GGB9063_SGB13982|SGB13982"
## [116,] "Collinsella_aerofaciens|SGB14535"
## [117,] "GGB9522_SGB14921|SGB14921"
## [118,] "GGB9619_SGB15067|SGB15067"
## [119,] "Clostridium_leptum|SGB14853"
## [120,] "GGB52130_SGB14966|SGB14966"
## [121,] "Clostridium_sp_AF27_2AA|SGB4712"
## [122,] "Blautia_sp_MCC283|SGB4828"
## [123,] "Clostridiales_bacterium_KLE1615|SGB5090"
## [124,] "Lachnospira_sp_NSJ_43|SGB5087"
## [125,] "Butyricimonas_virosa|SGB1784"
## [126,] "GGB9509_SGB14906|SGB14906"
## [127,] "Coprococcus_catus|SGB4670"
## [128,] "GGB4585_SGB6340|SGB6340"
## [129,] "Phascolarctobacterium_faecium|SGB5792"
## [130,] "Clostridiaceae_bacterium_Marseille_Q4143|SGB4768"
## [131,] "GGB45432_SGB63101|SGB63101"
## [132,] "Clostridium_sp_AM33_3|SGB4711"
## [133,] "GGB9342_SGB14306|SGB14306"
## [134,] "Faecalicatena_fissicatena|SGB4871"
## [135,] "Alistipes_finegoldii|SGB2301"
## [136,] "Oscillospiraceae_bacterium_Marseille_Q3528|SGB4778"
## [137,] "Faecalibacterium_sp_HTFF|SGB15340"
## [138,] "GGB3653_SGB4964|SGB4964"
## [139,] "Intestinimonas_butyriciproducens|SGB15126"
## [140,] "GGB9719_SGB15272|SGB15272"
## [141,] "Blautia_glucerasea|SGB4816"
## [142,] "GGB2653_SGB3574|SGB3574"
## [143,] "Veillonella_dispar|SGB6952"
## [144,] "bacterium_210917_DFI_7_65|SGB14999"
## [145,] "GGB9646_SGB15123|SGB15123"
## [146,] "GGB51441_SGB71759|SGB71759"
## [147,] "GGB33512_SGB15203|SGB15203"
## [148,] "Alistipes_indistinctus|SGB2325"
## [149,] "Candidatus_Borkfalkia_ceftriaxoniphila|SGB14027"
## [150,] "GGB9062_SGB13981|SGB13981"
## [151,] "Lachnospiraceae_bacterium|SGB4781"
## [152,] "Phocea_massiliensis|SGB14837"
## [153,] "Guopingia_tenuis|SGB14127"
## [154,] "GGB3109_SGB4121|SGB4121"
## [155,] "GGB9534_SGB14937|SGB14937"
## [156,] "Anaerotruncus_rubiinfantis|SGB25416"
## [157,] "Lachnospiraceae_bacterium|SGB4953"
## [158,] "Oscillibacter_sp_PC13|SGB7258"
## [159,] "Blautia_SGB4815|SGB4815"
## [160,] "Lawsonibacter_hominis|SGB15131"
## [161,] "Clostridium_SGB4750|SGB4750"
## [162,] "Holdemania_filiformis|SGB4046"
## [163,] "Anaeromassilibacillus_senegalensis|SGB14894"
## [164,] "Lachnospira_pectinoschiza|SGB5089"
## [165,] "Anaerofilum_hominis|SGB79822"
## [166,] "Lachnotalea_sp_AF33_28|SGB5200"
## [167,] "Senegalimassilia_anaerobia|SGB14824_group"
## [168,] "GGB9345_SGB14311|SGB14311"
## [169,] "Blautia_obeum|SGB4811"
## [170,] "Ligaoa_zhengdingensis|SGB14839"
## [171,] "Adlercreutzia_equolifaciens|SGB14797"
## [172,] "Blautia_wexlerae|SGB4837"
## [173,] "GGB9787_SGB15410|SGB15410"
## [174,] "GGB36331_SGB15121|SGB15121"
## [175,] "Clostridium_sp_AM22_11AC|SGB4749"
## [176,] "Ruminococcus_sp_AF41_9|SGB25497"
## [177,] "Anaerotruncus_colihominis|SGB14963"
## [178,] "Eubacterium_ramulus|SGB4959"
## [179,] "Enterocloster_lavalensis|SGB4725"
## [180,] "GGB2848_SGB3813|SGB3813"
## [181,] "Blautia_faecis|SGB4820"
## [182,] "Clostridiaceae_bacterium_Marseille_Q4145|SGB4769"
## [183,] "Roseburia_sp_AF02_12|SGB4938"
## [184,] "Butyricimonas_faecihominis|SGB1786"
## [185,] "Youxingia_wuxianensis|SGB82503"
## [186,] "Intestinimonas_gabonensis|SGB79840"
## [187,] "GGB9064_SGB13983|SGB13983"
## [188,] "GGB45613_SGB63326|SGB63326"
## [189,] "GGB4552_SGB6276|SGB6276"
## [190,] "Blautia_massiliensis|SGB4826"
## [191,] "Lachnospiraceae_bacterium_CLA_AA_H215|SGB4777"
## [192,] "Dorea_formicigenerans|SGB4575"
## [193,] "Clostridia_bacterium_UC5_1_1D1|SGB14995"
## [194,] "GGB9765_SGB15382|SGB15382"
## [195,] "GGB9770_SGB15390|SGB15390"
## [196,] "Clostridium_sp_AM49_4BH|SGB4652"
## [197,] "GGB3537_SGB4727|SGB4727"
## [198,] "Wansuia_hejianensis|SGB25431"
## [199,] "Coprococcus_eutactus|SGB5121"
## [200,] "Coprobacter_secundus|SGB1962"
## [201,] "GGB3510_SGB4687|SGB4687"
## [202,] "Mediterraneibacter_faecis|SGB4563"
## [203,] "Ruminococcus_torques|SGB4608"
## [204,] "Coprococcus_comes|SGB4577"
## [205,] "Streptococcus_parasanguinis|SGB8071"
## [206,] "Provencibacterium_massiliense|SGB14838"
## [207,] "Eubacterium_ventriosum|SGB5045"
## [208,] "Enterocloster_citroniae|SGB4761"
## [209,] "GGB51647_SGB4348|SGB4348"
## [210,] "GGB9635_SGB15106|SGB15106"
## [211,] "GGB9758_SGB15368|SGB15368"
## [212,] "GGB9708_SGB15234|SGB15234"
## [213,] "GGB3304_SGB4367|SGB4367"
## [214,] "Wujia_chipingensis|SGB5111"
## [215,] "Clostridiaceae_bacterium_Marseille_Q4149|SGB15091"
## [216,] "Paraprevotella_clara|SGB1798"
## [217,] "Agathobaculum_butyriciproducens|SGB14991"
## [218,] "GGB9296_SGB14253|SGB14253"
## [219,] "Butyricimonas_paravirosa|SGB1785"
## [220,] "Dorea_longicatena|SGB4581"
## [221,] "Ruminococcus_lactaris|SGB4557"
## [222,] "Alistipes_dispar|SGB2311"
## [223,] "Dysosmobacter_SGB15077|SGB15077"
## [224,] "GGB13489_SGB15224|SGB15224"
## [225,] "Clostridium_sp_AF12_28|SGB4715"
## [226,] "GGB34797_SGB14322|SGB14322"
## [227,] "Anaerobutyricum_hallii|SGB4532"
## [228,] "Clostridium_SGB48024|SGB48024"
## [229,] "Slackia_isoflavoniconvertens|SGB14773"
## [230,] "GGB3321_SGB4394|SGB4394"
## [231,] "GGB9237_SGB14179|SGB14179"
## [232,] "Anaerotruncus_massiliensis|SGB14965"
## [233,] "GGB9531_SGB14932|SGB14932"
## [234,] "GGB3619_SGB4894|SGB4894"
## [235,] "Streptococcus_salivarius|SGB8007_group"
## [236,] "Oscillibacter_valericigenes|SGB15076"
## [237,] "Anaerostipes_hadrus|SGB4540"
## [238,] "Blautia_wexlerae|SGB4831"
## [239,] "GGB2970_SGB3952|SGB3952"
## [240,] "Lawsonibacter_SGB15145|SGB15145"
## [241,] "GGB9563_SGB14975|SGB14975"
## [242,] "Blautia_glucerasea|SGB4804"
## [243,] "Anaerostipes_hadrus|SGB4547"
## [244,] "Lachnospiraceae_bacterium|SGB4782"
## [245,] "GGB2998_SGB3988|SGB3988"
## [246,] "GGB9616_SGB15052|SGB15052"
## [247,] "GGB9524_SGB14924|SGB14924"
## [248,] "Segatella_copri|SGB1626"
## [249,] "Mediterraneibacter_butyricigenes|SGB25493"
## [250,] "Haemophilus_parainfluenzae|SGB9712"
## [251,] "Anaerosacchariphilus_sp_NSJ_68|SGB4772"
## [252,] "Anaerotignum_sp_MSJ_24|SGB5180"
## [253,] "Dorea_sp_AF36_15AT|SGB4552"
## [254,] "GGB58158_SGB79798|SGB79798"
## [255,] "Pseudoflavonifractor_capillosus|SGB15140"
## [256,] "Oliverpabstia_intestinalis|SGB4868"
## [257,] "Veillonella_parvula|SGB6939"
## [258,] "Colidextribacter_sp_210702_DFI_3_9|SGB15146"
## [259,] "Coprobacter_fastidiosus|SGB1963"
## [260,] "Veillonella_rogosae|SGB6956"
## [261,] "Eubacteriaceae_bacterium|SGB3958"
## [262,] "TM7_phylum_sp_oral_taxon_352|SGB19860_group"
## [263,] "Enterocloster_hominis|SGB4721"
## [264,] "Rothia_mucilaginosa|SGB16971_group"
## [265,] "Blautia_luti|SGB4832"
## [266,] "Blautia_sp_OF03_15BH|SGB4779"
## [267,] "Streptococcus_australis|SGB8059_group"
## [268,] "Ruminococcus_gnavus|SGB4571"
## [269,] "Streptococcus_thermophilus|SGB8002"
## [270,] "Roseburia_hominis|SGB4659"
## [271,] "Lentihominibacter_faecis|SGB3957"
## [272,] "Actinomyces_sp_ICM47|SGB17167_group"
## [273,] "Evtepia_gabavorous|SGB15120"
## [274,] "Blautia_obeum|SGB4810"
## [275,] "GGB3480_SGB4648|SGB4648"
## [276,] "Ellagibacter_isourolithinifaciens|SGB14816_group"
## [277,] "GGB2982_SGB3964|SGB3964"
## [278,] "Anaerobutyricum_soehngenii|SGB4537"
## [279,] "Bacteroides_xylanisolvens|SGB1867"
## [280,] "Neobittarella_massiliensis|SGB7264"
## [281,] "Faecalibacillus_intestinalis|SGB6754"
## [282,] "GGB9708_SGB15233|SGB15233"
## [283,] "Enterocloster_asparagiformis|SGB4724"
## [284,] "GGB51884_SGB49168|SGB49168"
## [285,] "Enterocloster_aldenensis|SGB4762"
## [286,] "Eggerthella_lenta|SGB14809"
## [287,] "Bacteroides_fragilis|SGB1855"
## [288,] "Veillonella_atypica|SGB6936"
## [289,] "Enterocloster_bolteae|SGB4758"
## [290,] "Romboutsia_timonensis|SGB6148"
## [291,] "Roseburia_lenta|SGB4957"
## [292,] "GGB3288_SGB4342|SGB4342"
## [293,] "GGB9574_SGB14987|SGB14987"
## [294,] "Dorea_longicatena|SGB4582"
## [295,] "Eubacterium_sp_AF34_35BH|SGB5051"
## [296,] "GGB9640_SGB15115|SGB15115"
## [297,] "Clostridiaceae_unclassified_SGB4771|SGB4771"
## [298,] "GGB9766_SGB15383|SGB15383"
## [299,] "Lachnospiraceae_bacterium_OM04_12BH|SGB4893"
## [300,] "Clostridium_SGB6173|SGB6173"
## [301,] "GGB9616_SGB15051|SGB15051"
## [302,] "Intestinibacter_bartlettii|SGB6140"
## [,2]
## [1,] "Sample"
## [2,] "Data"
## [3,] "Diet"
## [4,] "Bacteroides_stercoris|SGB1830"
## [5,] "Alistipes_putredinis|SGB2318"
## [6,] "Candidatus_Cibionibacter_quicibialis|SGB15286"
## [7,] "Bacteroides_uniformis|SGB1836"
## [8,] "Eubacterium_siraeum|SGB4198"
## [9,] "GGB9602_SGB15031|SGB15031"
## [10,] "Phocaeicola_vulgatus|SGB1814"
## [11,] "Faecalibacterium_prausnitzii|SGB15316"
## [12,] "Lachnospiraceae_bacterium_CLA_AA_H244|SGB4993"
## [13,] "Sutterella_wadsworthensis|SGB9283"
## [14,] "Oscillibacter_sp_ER4|SGB15254"
## [15,] "Parabacteroides_distasonis|SGB1934"
## [16,] "Parabacteroides_merdae|SGB1949"
## [17,] "Oscillibacter_valericigenes|SGB15053"
## [18,] "GGB47687_SGB2286|SGB2286"
## [19,] "Brotolimicola_acetigignens|SGB4914"
## [20,] "GGB9747_SGB15356|SGB15356"
## [21,] "Faecalibacterium_prausnitzii|SGB15339"
## [22,] "Alistipes_shahii|SGB2295"
## [23,] "Alistipes_onderdonkii|SGB2303"
## [24,] "Faecalibacterium_SGB15346|SGB15346"
## [25,] "Dialister_invisus|SGB5825_group"
## [26,] "Faecalibacterium_prausnitzii|SGB15332"
## [27,] "Gemmiger_formicilis|SGB15300"
## [28,] "GGB33469_SGB15236|SGB15236"
## [29,] "Bacteroides_eggerthii|SGB1829"
## [30,] "Coprococcus_eutactus|SGB5117"
## [31,] "Alistipes_communis|SGB2290"
## [32,] "Faecalibacterium_prausnitzii|SGB15322"
## [33,] "Escherichia_coli|SGB10068"
## [34,] "Faecalibacterium_prausnitzii|SGB15318"
## [35,] "GGB3175_SGB4191|SGB4191"
## [36,] "Faecalibacterium_prausnitzii|SGB15342"
## [37,] "Fusicatenibacter_saccharivorans|SGB4874"
## [38,] "Intestinimonas_massiliensis|SGB15127"
## [39,] "Vescimonas_coprocola|SGB15089"
## [40,] "Oscillibacter_sp_MSJ_31|SGB15249"
## [41,] "Bacteroides_caccae|SGB1877"
## [42,] "Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861"
## [43,] "Bifidobacterium_longum|SGB17248"
## [44,] "Alistipes_senegalensis|SGB2296"
## [45,] "Roseburia_intestinalis|SGB4951"
## [46,] "Dysosmobacter_welbionis|SGB15078"
## [47,] "Eubacterium_rectale|SGB4933"
## [48,] "Alistipes_ihumii|SGB2328"
## [49,] "Roseburia_faecis|SGB4925"
## [50,] "GGB9699_SGB15216|SGB15216"
## [51,] "Oscillospiraceae_bacterium|SGB15225"
## [52,] "Faecalibacterium_prausnitzii|SGB15317"
## [53,] "GGB9715_SGB15267|SGB15267"
## [54,] "GGB9667_SGB15164|SGB15164"
## [55,] "Lachnospira_pectinoschiza|SGB5075"
## [56,] "Bacteroides_clarus|SGB1832"
## [57,] "Anaerotignum_faecicola|SGB5190"
## [58,] "Hydrogenoanaerobacterium_saccharovorans|SGB15350"
## [59,] "Bacteroides_faecis|SGB1860"
## [60,] "Roseburia_hominis|SGB4936"
## [61,] "GGB9712_SGB15244|SGB15244"
## [62,] "Ruminococcus_bicirculans|SGB4262"
## [63,] "Bilophila_wadsworthia|SGB15452"
## [64,] "Odoribacter_splanchnicus|SGB1790"
## [65,] "GGB9365_SGB14341|SGB14341"
## [66,] "GGB9715_SGB15265|SGB15265"
## [67,] "GGB9730_SGB15291|SGB15291"
## [68,] "Bifidobacterium_adolescentis|SGB17244"
## [69,] "Bacteroides_ovatus|SGB1871"
## [70,] "GGB9614_SGB15049|SGB15049"
## [71,] "Simiaoa_sunii|SGB4910"
## [72,] "GGB9759_SGB15370|SGB15370"
## [73,] "Barnesiella_intestinihominis|SGB1965"
## [74,] "Agathobaculum_butyriciproducens|SGB14993"
## [75,] "Lachnospira_eligens|SGB5082"
## [76,] "Lacrimispora_amygdalina|SGB4716"
## [77,] "Akkermansia_muciniphila|SGB9226"
## [78,] "Clostridiales_bacterium|SGB15143"
## [79,] "GGB6649_SGB9391|SGB9391"
## [80,] "Ruminococcus_bromii|SGB4285"
## [81,] "Gemmiger_formicilis|SGB15299"
## [82,] "Alistipes_sp_AF17_16|SGB2326"
## [83,] "Clostridiaceae_bacterium|SGB4269"
## [84,] "GGB9760_SGB15373|SGB15373"
## [85,] "Clostridiaceae_bacterium_AF18_31LB|SGB4767"
## [86,] "GGB9621_SGB15073|SGB15073"
## [87,] "Bacteroides_cellulosilyticus|SGB1844"
## [88,] "Faecalibacterium_sp_CLA_AA_H233|SGB15315"
## [89,] "Ruthenibacterium_lactatiformans|SGB15271"
## [90,] "Lawsonibacter_asaccharolyticus|SGB15154"
## [91,] "Clostridium_sp_AF20_17LB|SGB4714"
## [92,] "Phocaeicola_massiliensis|SGB1812"
## [93,] "GGB9760_SGB15374|SGB15374"
## [94,] "Clostridium_fessum|SGB4705"
## [95,] "GGB33586_SGB53517|SGB53517"
## [96,] "Clostridium_sp_AF36_4|SGB4644"
## [97,] "Clostridium_SGB4909|SGB4909"
## [98,] "GGB13404_SGB14252|SGB14252"
## [99,] "GGB9627_SGB15081|SGB15081"
## [100,] "GGB9608_SGB15041|SGB15041"
## [101,] "GGB9707_SGB15229|SGB15229"
## [102,] "Parasutterella_excrementihominis|SGB9262"
## [103,] "Roseburia_inulinivorans|SGB4940"
## [104,] "Phocaeicola_dorei|SGB1815"
## [105,] "Oscillibacter_valericigenes|SGB15124"
## [106,] "GGB9502_SGB14899|SGB14899"
## [107,] "GGB9818_SGB15459|SGB15459"
## [108,] "GGB9767_SGB15385|SGB15385"
## [109,] "Clostridiaceae_bacterium|SGB4770"
## [110,] "Bacteroides_thetaiotaomicron|SGB1861"
## [111,] "GGB9559_SGB14969|SGB14969"
## [112,] "Flavonifractor_plautii|SGB15132"
## [113,] "Fusicatenibacter_sp_CLA_AA_H277|SGB4780"
## [114,] "Lachnospiraceae_bacterium_AM48_27BH|SGB4706"
## [115,] "GGB9063_SGB13982|SGB13982"
## [116,] "Collinsella_aerofaciens|SGB14535"
## [117,] "GGB9522_SGB14921|SGB14921"
## [118,] "GGB9619_SGB15067|SGB15067"
## [119,] "Clostridium_leptum|SGB14853"
## [120,] "GGB52130_SGB14966|SGB14966"
## [121,] "Clostridium_sp_AF27_2AA|SGB4712"
## [122,] "Blautia_sp_MCC283|SGB4828"
## [123,] "Clostridiales_bacterium_KLE1615|SGB5090"
## [124,] "Lachnospira_sp_NSJ_43|SGB5087"
## [125,] "Butyricimonas_virosa|SGB1784"
## [126,] "GGB9509_SGB14906|SGB14906"
## [127,] "Coprococcus_catus|SGB4670"
## [128,] "GGB4585_SGB6340|SGB6340"
## [129,] "Phascolarctobacterium_faecium|SGB5792"
## [130,] "Clostridiaceae_bacterium_Marseille_Q4143|SGB4768"
## [131,] "GGB45432_SGB63101|SGB63101"
## [132,] "Clostridium_sp_AM33_3|SGB4711"
## [133,] "GGB9342_SGB14306|SGB14306"
## [134,] "Faecalicatena_fissicatena|SGB4871"
## [135,] "Alistipes_finegoldii|SGB2301"
## [136,] "Oscillospiraceae_bacterium_Marseille_Q3528|SGB4778"
## [137,] "Faecalibacterium_sp_HTFF|SGB15340"
## [138,] "GGB3653_SGB4964|SGB4964"
## [139,] "Intestinimonas_butyriciproducens|SGB15126"
## [140,] "GGB9719_SGB15272|SGB15272"
## [141,] "Blautia_glucerasea|SGB4816"
## [142,] "GGB2653_SGB3574|SGB3574"
## [143,] "Veillonella_dispar|SGB6952"
## [144,] "bacterium_210917_DFI_7_65|SGB14999"
## [145,] "GGB9646_SGB15123|SGB15123"
## [146,] "GGB51441_SGB71759|SGB71759"
## [147,] "GGB33512_SGB15203|SGB15203"
## [148,] "Alistipes_indistinctus|SGB2325"
## [149,] "Candidatus_Borkfalkia_ceftriaxoniphila|SGB14027"
## [150,] "GGB9062_SGB13981|SGB13981"
## [151,] "Lachnospiraceae_bacterium|SGB4781"
## [152,] "Phocea_massiliensis|SGB14837"
## [153,] "Guopingia_tenuis|SGB14127"
## [154,] "GGB3109_SGB4121|SGB4121"
## [155,] "GGB9534_SGB14937|SGB14937"
## [156,] "Anaerotruncus_rubiinfantis|SGB25416"
## [157,] "Lachnospiraceae_bacterium|SGB4953"
## [158,] "Oscillibacter_sp_PC13|SGB7258"
## [159,] "Blautia_SGB4815|SGB4815"
## [160,] "Lawsonibacter_hominis|SGB15131"
## [161,] "Clostridium_SGB4750|SGB4750"
## [162,] "Holdemania_filiformis|SGB4046"
## [163,] "Anaeromassilibacillus_senegalensis|SGB14894"
## [164,] "Lachnospira_pectinoschiza|SGB5089"
## [165,] "Anaerofilum_hominis|SGB79822"
## [166,] "Lachnotalea_sp_AF33_28|SGB5200"
## [167,] "Senegalimassilia_anaerobia|SGB14824_group"
## [168,] "GGB9345_SGB14311|SGB14311"
## [169,] "Blautia_obeum|SGB4811"
## [170,] "Ligaoa_zhengdingensis|SGB14839"
## [171,] "Adlercreutzia_equolifaciens|SGB14797"
## [172,] "Blautia_wexlerae|SGB4837"
## [173,] "GGB9787_SGB15410|SGB15410"
## [174,] "GGB36331_SGB15121|SGB15121"
## [175,] "Clostridium_sp_AM22_11AC|SGB4749"
## [176,] "Ruminococcus_sp_AF41_9|SGB25497"
## [177,] "Anaerotruncus_colihominis|SGB14963"
## [178,] "Eubacterium_ramulus|SGB4959"
## [179,] "Enterocloster_lavalensis|SGB4725"
## [180,] "GGB2848_SGB3813|SGB3813"
## [181,] "Blautia_faecis|SGB4820"
## [182,] "Clostridiaceae_bacterium_Marseille_Q4145|SGB4769"
## [183,] "Roseburia_sp_AF02_12|SGB4938"
## [184,] "Butyricimonas_faecihominis|SGB1786"
## [185,] "Youxingia_wuxianensis|SGB82503"
## [186,] "Intestinimonas_gabonensis|SGB79840"
## [187,] "GGB9064_SGB13983|SGB13983"
## [188,] "GGB45613_SGB63326|SGB63326"
## [189,] "GGB4552_SGB6276|SGB6276"
## [190,] "Blautia_massiliensis|SGB4826"
## [191,] "Lachnospiraceae_bacterium_CLA_AA_H215|SGB4777"
## [192,] "Dorea_formicigenerans|SGB4575"
## [193,] "Clostridia_bacterium_UC5_1_1D1|SGB14995"
## [194,] "GGB9765_SGB15382|SGB15382"
## [195,] "GGB9770_SGB15390|SGB15390"
## [196,] "Clostridium_sp_AM49_4BH|SGB4652"
## [197,] "GGB3537_SGB4727|SGB4727"
## [198,] "Wansuia_hejianensis|SGB25431"
## [199,] "Coprococcus_eutactus|SGB5121"
## [200,] "Coprobacter_secundus|SGB1962"
## [201,] "GGB3510_SGB4687|SGB4687"
## [202,] "Mediterraneibacter_faecis|SGB4563"
## [203,] "Ruminococcus_torques|SGB4608"
## [204,] "Coprococcus_comes|SGB4577"
## [205,] "Streptococcus_parasanguinis|SGB8071"
## [206,] "Provencibacterium_massiliense|SGB14838"
## [207,] "Eubacterium_ventriosum|SGB5045"
## [208,] "Enterocloster_citroniae|SGB4761"
## [209,] "GGB51647_SGB4348|SGB4348"
## [210,] "GGB9635_SGB15106|SGB15106"
## [211,] "GGB9758_SGB15368|SGB15368"
## [212,] "GGB9708_SGB15234|SGB15234"
## [213,] "GGB3304_SGB4367|SGB4367"
## [214,] "Wujia_chipingensis|SGB5111"
## [215,] "Clostridiaceae_bacterium_Marseille_Q4149|SGB15091"
## [216,] "Paraprevotella_clara|SGB1798"
## [217,] "Agathobaculum_butyriciproducens|SGB14991"
## [218,] "GGB9296_SGB14253|SGB14253"
## [219,] "Butyricimonas_paravirosa|SGB1785"
## [220,] "Dorea_longicatena|SGB4581"
## [221,] "Ruminococcus_lactaris|SGB4557"
## [222,] "Alistipes_dispar|SGB2311"
## [223,] "Dysosmobacter_SGB15077|SGB15077"
## [224,] "GGB13489_SGB15224|SGB15224"
## [225,] "Clostridium_sp_AF12_28|SGB4715"
## [226,] "GGB34797_SGB14322|SGB14322"
## [227,] "Anaerobutyricum_hallii|SGB4532"
## [228,] "Clostridium_SGB48024|SGB48024"
## [229,] "Slackia_isoflavoniconvertens|SGB14773"
## [230,] "GGB3321_SGB4394|SGB4394"
## [231,] "GGB9237_SGB14179|SGB14179"
## [232,] "Anaerotruncus_massiliensis|SGB14965"
## [233,] "GGB9531_SGB14932|SGB14932"
## [234,] "GGB3619_SGB4894|SGB4894"
## [235,] "Streptococcus_salivarius|SGB8007_group"
## [236,] "Oscillibacter_valericigenes|SGB15076"
## [237,] "Anaerostipes_hadrus|SGB4540"
## [238,] "Blautia_wexlerae|SGB4831"
## [239,] "GGB2970_SGB3952|SGB3952"
## [240,] "Lawsonibacter_SGB15145|SGB15145"
## [241,] "GGB9563_SGB14975|SGB14975"
## [242,] "Blautia_glucerasea|SGB4804"
## [243,] "Anaerostipes_hadrus|SGB4547"
## [244,] "Lachnospiraceae_bacterium|SGB4782"
## [245,] "GGB2998_SGB3988|SGB3988"
## [246,] "GGB9616_SGB15052|SGB15052"
## [247,] "GGB9524_SGB14924|SGB14924"
## [248,] "Segatella_copri|SGB1626"
## [249,] "Mediterraneibacter_butyricigenes|SGB25493"
## [250,] "Haemophilus_parainfluenzae|SGB9712"
## [251,] "Anaerosacchariphilus_sp_NSJ_68|SGB4772"
## [252,] "Anaerotignum_sp_MSJ_24|SGB5180"
## [253,] "Dorea_sp_AF36_15AT|SGB4552"
## [254,] "GGB58158_SGB79798|SGB79798"
## [255,] "Pseudoflavonifractor_capillosus|SGB15140"
## [256,] "Oliverpabstia_intestinalis|SGB4868"
## [257,] "Veillonella_parvula|SGB6939"
## [258,] "Colidextribacter_sp_210702_DFI_3_9|SGB15146"
## [259,] "Coprobacter_fastidiosus|SGB1963"
## [260,] "Veillonella_rogosae|SGB6956"
## [261,] "Eubacteriaceae_bacterium|SGB3958"
## [262,] "TM7_phylum_sp_oral_taxon_352|SGB19860_group"
## [263,] "Enterocloster_hominis|SGB4721"
## [264,] "Rothia_mucilaginosa|SGB16971_group"
## [265,] "Blautia_luti|SGB4832"
## [266,] "Blautia_sp_OF03_15BH|SGB4779"
## [267,] "Streptococcus_australis|SGB8059_group"
## [268,] "Ruminococcus_gnavus|SGB4571"
## [269,] "Streptococcus_thermophilus|SGB8002"
## [270,] "Roseburia_hominis|SGB4659"
## [271,] "Lentihominibacter_faecis|SGB3957"
## [272,] "Actinomyces_sp_ICM47|SGB17167_group"
## [273,] "Evtepia_gabavorous|SGB15120"
## [274,] "Blautia_obeum|SGB4810"
## [275,] "GGB3480_SGB4648|SGB4648"
## [276,] "Ellagibacter_isourolithinifaciens|SGB14816_group"
## [277,] "GGB2982_SGB3964|SGB3964"
## [278,] "Anaerobutyricum_soehngenii|SGB4537"
## [279,] "Bacteroides_xylanisolvens|SGB1867"
## [280,] "Neobittarella_massiliensis|SGB7264"
## [281,] "Faecalibacillus_intestinalis|SGB6754"
## [282,] "GGB9708_SGB15233|SGB15233"
## [283,] "Enterocloster_asparagiformis|SGB4724"
## [284,] "GGB51884_SGB49168|SGB49168"
## [285,] "Enterocloster_aldenensis|SGB4762"
## [286,] "Eggerthella_lenta|SGB14809"
## [287,] "Bacteroides_fragilis|SGB1855"
## [288,] "Veillonella_atypica|SGB6936"
## [289,] "Enterocloster_bolteae|SGB4758"
## [290,] "Romboutsia_timonensis|SGB6148"
## [291,] "Roseburia_lenta|SGB4957"
## [292,] "GGB3288_SGB4342|SGB4342"
## [293,] "GGB9574_SGB14987|SGB14987"
## [294,] "Dorea_longicatena|SGB4582"
## [295,] "Eubacterium_sp_AF34_35BH|SGB5051"
## [296,] "GGB9640_SGB15115|SGB15115"
## [297,] "Clostridiaceae_unclassified_SGB4771|SGB4771"
## [298,] "GGB9766_SGB15383|SGB15383"
## [299,] "Lachnospiraceae_bacterium_OM04_12BH|SGB4893"
## [300,] "Clostridium_SGB6173|SGB6173"
## [301,] "GGB9616_SGB15051|SGB15051"
## [302,] "Intestinibacter_bartlettii|SGB6140"
common_microbiome <- intersect(
colnames(data_microbiome_original),
colnames(data_microbiome_validation))[-c(1:2)]
tr1 <- data_microbiome_original %>%
mutate(Data = if_else(Country == 'CZ', 'CZ_tr', 'IT_tr')) %>%
select(Data, Diet, all_of(common_microbiome))
tr2 <- data_microbiome_validation %>%
mutate(Data = 'valid',
Diet = Diet) %>%
select(Data, Diet, all_of(common_microbiome))
data_merged <- bind_rows(tr1, tr2)
data_microbiome_validation$`Wujia_chipingenss|SGB5111`
## NULL
data_microbiome_validation$`Wujia_chipingensis|SGB5111`
## [1] -2.2659270 -1.8366784 -1.5719579 -1.5649595 -1.3519655 -1.9273808
## [7] -0.5872619 -1.9902827 -0.9982257 -1.6966210 -2.1930395 -0.1319288
## [13] -2.1260143 -1.9245373 -2.0088636 0.5526297 3.9926563 -0.7907929
## [19] -2.1891711 -1.9395276 -2.1353811 -2.1654279 -2.2360908 -0.3930113
## [25] -2.3957911 -0.2977329 -1.4533578 -1.4072936 -2.0794670 1.8049062
## [31] 1.5258843 3.6617309 3.8680214 -2.0412347 0.4626366 0.3866172
## [37] 5.8044209 -2.2201075 -0.4444399 -2.1377634 -2.1399921 -2.0933531
## [43] 4.5931692 -1.6871195 2.8164681 -2.5644101 -2.0585439 -2.6629664
## [49] 0.6723405 4.4534434 4.1610574 5.3832639 -1.6933290 -2.5382040
## [55] -2.4107280 -2.1198776 5.3413842 -1.8967301 -1.5487724 -2.1396917
## [61] -1.7628668 4.1335815 5.0872100 3.6031291 -1.8040730 4.9799740
## [67] -0.9850500 -1.5123666 -0.5770878 -1.7964678 -2.1354643 2.3798401
## [73] 5.9776544 -2.2009657 -2.4395272 4.0615507 -1.3941153 -2.3882306
## [79] -2.2799115 4.6054080 -1.5516203 -0.4367249 -1.5615449 4.0790976
## [85] 1.0917225 3.7139508 -1.8139512 -1.1749955 -1.6831395 -1.5614355
## [91] -1.9565894 -0.1425388 -1.0454574 -2.1128182 4.9182020 -1.6631032
## [97] -2.3185501 -1.6734295 -1.6233100 -1.5638780 3.3149668 4.7828809
## [103] -1.55053363.2 Explore
3.2.0.1 Distributions - clr transformed
Open code
check <- data_microbiome_original %>%
dplyr::select(-c(Sample:Diet)
) %>%
na.omit()
size = c(8, 7)
par(mfrow = c(size[1],size[2]))
par(mar=c(2,1.5,2,0.5))
set.seed(16)
ran <- sample(1:ncol(check), size[1]*size[2], replace = FALSE)
for(x in ran){
hist(check[,x],
16,
col='blue',
main = paste0(colnames(check)[x])
)
}3.2.0.2 Taxon proportions accross groups
Open code
colo <- c('#329243', '#F9FFAF')
data_merged <- na.omit(data_merged)
outcomes <- common_microbiome[
sample(
1:length(common_microbiome), 35, replace = FALSE
)
]
boxplot_cond <- function(variable) {
p <- ggboxplot(data_merged,
x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'CLR(taxa proportion)') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
plots <- map(outcomes, boxplot_cond)
# Create a matrix of plots
plots_arranged <- ggarrange(plotlist = plots, ncol = 5, nrow = 7, common.legend = TRUE)
plots_arranged4 Linear models across taxa
We will fit a feature-specific linear model where the CLR-transformed count of bacteria reads represents the outcome variable whereas country (Italy vs Czech), diet (vegan vs omnivore), and their interaction (country:diet) all represent fixed-effects predictors. So, each model has the following form
\[ CLR({N_i}) = \alpha + \beta_{1} \times country + \beta_{2} \times diet + \beta_{3} \times country:diet + \epsilon \] where \(N_i\) is read count of \(i\)-th bacteria taxa
The variables were coded as follows: \(diet = -0.5\) for omnivores and \(diet = 0.5\) for vegans; \(country = -0.5\) for the Czech cohort and \(country = 0.5\) for the Italian cohort.
This parameterization allows us to interpret the linear model summary output as presenting the conditional effects of diet averaged across both countries and the conditional effects of country averaged across both diet groups. We will then use the emmeans package (Lenth 2024) to obtain specific estimates for the effect of diet in the Italian and Czech cohorts separately, still from a single model.
Taxa that will show a significant diet effect (average effect of diet across both countries, adjusted for multiple comparisons with FDR < 0.1) will be then visualized using a forest plot, with country-specific diet effect along with diet effect based on independent validation cohort, to evaluate how generalizable these findings are (see external validation section).
Note that p-value for avg effects are the same as produced with car::Anova(model, type = 'III').
4.1 Select data
Open code
data_analysis <- data_microbiome_original %>%
na.omit() %>%
dplyr::mutate(
Diet_VEGAN = as.numeric(
dplyr::if_else(
Diet == "VEGAN", 0.5, -0.5
)
),
Country_IT = as.numeric(
dplyr::if_else(
Country == "IT", 0.5, -0.5
)
)
) %>%
dplyr::select(
Sample,
Country,
Country_IT,
Diet,
Diet_VEGAN,
dplyr::everything()
)
summary(data_analysis[ , 1:12])
## Sample Country Country_IT Diet
## Length:166 Length:166 Min. :-0.50000 Length:166
## Class :character Class :character 1st Qu.:-0.50000 Class :character
## Mode :character Mode :character Median :-0.50000 Mode :character
## Mean :-0.03012
## 3rd Qu.: 0.50000
## Max. : 0.50000
## Diet_VEGAN Bacteroides_stercoris|SGB1830 Alistipes_putredinis|SGB2318
## Min. :-0.50000 Min. :-4.8322 Min. :-5.438
## 1st Qu.:-0.50000 1st Qu.:-3.6490 1st Qu.: 3.908
## Median : 0.50000 Median :-1.3277 Median : 6.130
## Mean : 0.08434 Mean : 0.5654 Mean : 4.703
## 3rd Qu.: 0.50000 3rd Qu.: 4.7830 3rd Qu.: 7.159
## Max. : 0.50000 Max. : 9.4417 Max. : 9.897
## Candidatus_Cibionibacter_quicibialis|SGB15286 Bacteroides_uniformis|SGB1836
## Min. :-2.250 Min. :-3.273
## 1st Qu.: 4.278 1st Qu.: 4.743
## Median : 5.405 Median : 6.158
## Mean : 5.113 Mean : 5.964
## 3rd Qu.: 6.447 3rd Qu.: 7.395
## Max. : 9.195 Max. :10.043
## Eubacterium_siraeum|SGB4198 GGB9602_SGB15031|SGB15031
## Min. :-5.694 Min. :-4.2797
## 1st Qu.:-4.539 1st Qu.:-2.6107
## Median :-2.937 Median : 0.3740
## Mean :-1.269 Mean : 0.5761
## 3rd Qu.: 2.030 3rd Qu.: 3.4878
## Max. : 6.338 Max. : 8.0727
## Phocaeicola_vulgatus|SGB1814
## Min. :-3.802
## 1st Qu.: 4.296
## Median : 5.775
## Mean : 5.483
## 3rd Qu.: 7.100
## Max. :10.5194.1.1 Define number of microbiome and covariates
Open code
n_covarites <- 5
n_features <- ncol(data_analysis) - n_covarites4.1.2 Create empty objects
Open code
outcome <- vector('double', n_features)
logFD_VGdiet_inCZ <- vector('double', n_features)
logFD_VGdiet_inIT <- vector('double', n_features)
logFD_VGdiet_avg <- vector('double', n_features)
logFD_ITcountry_avg <- vector('double', n_features)
diet_country_int <- vector('double', n_features)
P_VGdiet_inCZ <- vector('double', n_features)
P_VGdiet_inIT <- vector('double', n_features)
P_VGdiet_avg <- vector('double', n_features)
P_ITcountry_avg <- vector('double', n_features)
P_diet_country_int <- vector('double', n_features)
CI_L_VGdiet_inCZ <- vector('double', n_features)
CI_L_VGdiet_inIT <- vector('double', n_features)
CI_L_VGdiet_avg <- vector('double', n_features)
CI_U_VGdiet_inCZ <- vector('double', n_features)
CI_U_VGdiet_inIT <- vector('double', n_features)
CI_U_VGdiet_avg <- vector('double', n_features)4.1.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
data_analysis$outcome <- data_analysis[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Country_IT * Diet_VEGAN, data = data_analysis)
## get contrast (effects of diet BY COUNTRY)
contrast_emm <- summary(
pairs(
emmeans(
model,
specs = ~ Diet_VEGAN | Country_IT
),
interaction = TRUE,
adjust = "none"
),
infer = c(TRUE, TRUE)
)
## save results
outcome[i] <- names(data_analysis)[i + n_covarites]
## country effect
logFD_ITcountry_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT"
), 1
]
P_ITcountry_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT"
), 4
]
## diet effect
tr <- confint(model)
CI_L_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][1]
CI_U_VGdiet_avg[i] <- tr[which(row.names(tr) == 'Diet_VEGAN'),][2]
logFD_VGdiet_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 1
]
P_VGdiet_avg[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 4
]
logFD_VGdiet_inCZ[i] <- -contrast_emm[1,3]
P_VGdiet_inCZ[i] <- contrast_emm$p.value[1]
CI_L_VGdiet_inCZ[i] <- -contrast_emm$upper.CL[1]
CI_U_VGdiet_inCZ[i] <- -contrast_emm$lower.CL[1]
logFD_VGdiet_inIT[i] <- -contrast_emm[2,3]
P_VGdiet_inIT[i] <- contrast_emm$p.value[2]
CI_L_VGdiet_inIT[i] <- -contrast_emm$upper.CL[2]
CI_U_VGdiet_inIT[i] <- -contrast_emm$lower.CL[2]
## interaction
diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
), 1
]
P_diet_country_int[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Country_IT:Diet_VEGAN"
), 4
]
}4.1.4 Results table
Open code
result_microbiome <- data.frame(
outcome,
logFD_ITcountry_avg, P_ITcountry_avg,
logFD_VGdiet_avg, P_VGdiet_avg,
logFD_VGdiet_inCZ, P_VGdiet_inCZ,
logFD_VGdiet_inIT, P_VGdiet_inIT,
diet_country_int, P_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)4.1.5 Adjust p values
Open code
result_microbiome <- result_microbiome %>%
dplyr::mutate(
fdr_ITcountry_avg = p.adjust(P_ITcountry_avg, method = 'BH'),
fdr_VGdiet_avg = p.adjust(P_VGdiet_avg, method = 'BH'),
fdr_VGdiet_inCZ = p.adjust(P_VGdiet_inCZ, method = 'BH'),
fdr_VGdiet_inIT = p.adjust(P_VGdiet_inIT, method = 'BH'),
fdr_diet_country_int = p.adjust(P_diet_country_int, method = 'BH')
) %>%
dplyr::select(
outcome,
logFD_ITcountry_avg, P_ITcountry_avg, fdr_ITcountry_avg,
logFD_VGdiet_avg, P_VGdiet_avg, fdr_VGdiet_avg,
logFD_VGdiet_inCZ, P_VGdiet_inCZ, fdr_VGdiet_inCZ,
logFD_VGdiet_inIT, P_VGdiet_inIT, fdr_VGdiet_inIT,
diet_country_int, P_diet_country_int, fdr_diet_country_int,
CI_L_VGdiet_avg, CI_U_VGdiet_avg,
CI_L_VGdiet_inCZ, CI_U_VGdiet_inCZ,
CI_L_VGdiet_inIT, CI_U_VGdiet_inIT
)4.1.6 Show and save results
Open code
kableExtra::kable(result_microbiome %>% filter(fdr_VGdiet_avg < 0.05),
caption = "Result of linear models, modelling the CLR-transformed reads count of given bacterial taxa, with `Diet`, `Country` and `Diet:Country` interaction as predictors. Only bacteria whose CLR-transformed proportion differ between diet (FDR < 0.05, average diet effet across both countries) are shown. `logFD` prefix: implies estimated effects (regression coefficient), i.e. how much CLR-transformed reads count differ in vegans compared to omnivores, `P`: p-value, `fdr`: p-value after adjustment for multiple comparison, `CI_L` and `CI_U`: lower and upper bounds of 95% confidence interval respectively. `avg` suffix shows effect averaged across subgroups, whereas `inCZ` and `inIT` shows effect in Czech or Italian cohort respectively. All estimates in a single row are based on a single model"
)| outcome | logFD_ITcountry_avg | P_ITcountry_avg | fdr_ITcountry_avg | logFD_VGdiet_avg | P_VGdiet_avg | fdr_VGdiet_avg | logFD_VGdiet_inCZ | P_VGdiet_inCZ | fdr_VGdiet_inCZ | logFD_VGdiet_inIT | P_VGdiet_inIT | fdr_VGdiet_inIT | diet_country_int | P_diet_country_int | fdr_diet_country_int | CI_L_VGdiet_avg | CI_U_VGdiet_avg | CI_L_VGdiet_inCZ | CI_U_VGdiet_inCZ | CI_L_VGdiet_inIT | CI_U_VGdiet_inIT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Escherichia_coli|SGB10068 | 0.2619264 | 0.5866636 | 0.6604881 | -1.6056687 | 0.0010415 | 0.0124560 | -1.7212276 | 0.0123629 | 0.1114726 | -1.4901099 | 0.0297547 | 0.2433353 | 0.2311177 | 0.8103673 | 0.9929048 | -2.5551229 | -0.6562146 | -3.0647170 | -0.3777381 | -2.8320819 | -0.1481379 |
| Bacteroides_clarus|SGB1832 | 1.9131572 | 0.0002848 | 0.0008431 | -1.9852453 | 0.0001701 | 0.0031781 | -1.6674895 | 0.0236093 | 0.1534603 | -2.3030012 | 0.0018864 | 0.0961993 | -0.6355117 | 0.5386671 | 0.9929048 | -3.0036467 | -0.9668440 | -3.1085401 | -0.2264388 | -3.7424241 | -0.8635783 |
| Hydrogenoanaerobacterium_saccharovorans|SGB15350 | 1.5011518 | 0.0012291 | 0.0032813 | -1.3582916 | 0.0033582 | 0.0313785 | -1.3329832 | 0.0405517 | 0.2090509 | -1.3835999 | 0.0334061 | 0.2561137 | -0.0506167 | 0.9558340 | 0.9929048 | -2.2592946 | -0.4572885 | -2.6079137 | -0.0580527 | -2.6570904 | -0.1101095 |
| GGB6649_SGB9391|SGB9391 | 0.2325600 | 0.5179518 | 0.6002620 | 1.0140390 | 0.0053203 | 0.0418620 | 1.7803147 | 0.0005900 | 0.0117612 | 0.2477634 | 0.6259425 | 0.9452363 | -1.5325513 | 0.0342745 | 0.5717111 | 0.3052515 | 1.7228266 | 0.7773715 | 2.7832579 | -0.7540470 | 1.2495737 |
| Ruthenibacterium_lactatiformans|SGB15271 | 0.9036294 | 0.0030971 | 0.0073356 | -0.8414746 | 0.0057914 | 0.0444010 | -1.5432722 | 0.0003871 | 0.0100784 | -0.1396771 | 0.7430182 | 0.9689172 | 1.4035951 | 0.0209147 | 0.5717111 | -1.4356716 | -0.2472776 | -2.3840684 | -0.7024759 | -0.9795236 | 0.7001695 |
| Lawsonibacter_asaccharolyticus|SGB15154 | 2.8249857 | 0.0000000 | 0.0000000 | -1.5247873 | 0.0000971 | 0.0023730 | -1.9303765 | 0.0004595 | 0.0105548 | -1.1191981 | 0.0394843 | 0.2775605 | 0.8111784 | 0.2892321 | 0.9724308 | -2.2780362 | -0.7715384 | -2.9962332 | -0.8645198 | -2.1838509 | -0.0545454 |
| Clostridium_fessum|SGB4705 | -1.6158900 | 0.0000443 | 0.0001577 | -1.1006841 | 0.0048020 | 0.0398833 | -1.1368423 | 0.0384349 | 0.2063733 | -1.0645259 | 0.0521082 | 0.3314966 | 0.0723164 | 0.9252760 | 0.9929048 | -1.8607983 | -0.3405699 | -2.2124133 | -0.0612713 | -2.1388820 | 0.0098303 |
| GGB9707_SGB15229|SGB15229 | 0.6452823 | 0.0445093 | 0.0756152 | 1.0866666 | 0.0008197 | 0.0106555 | 0.9548967 | 0.0357238 | 0.2054116 | 1.2184364 | 0.0075553 | 0.1328848 | 0.2635397 | 0.6797760 | 0.9929048 | 0.4574080 | 1.7159252 | 0.0644880 | 1.8453055 | 0.3290334 | 2.1078395 |
| Lachnospiraceae_bacterium_AM48_27BH|SGB4706 | -1.1131285 | 0.0045403 | 0.0102071 | 1.2676366 | 0.0012809 | 0.0147307 | 1.6523047 | 0.0029449 | 0.0463439 | 0.8829685 | 0.1082011 | 0.4901839 | -0.7693361 | 0.3214104 | 0.9724308 | 0.5039082 | 2.0313650 | 0.5716194 | 2.7329899 | -0.1964961 | 1.9624331 |
| GGB9509_SGB14906|SGB14906 | 1.0817985 | 0.0089485 | 0.0182013 | -1.6543521 | 0.0000804 | 0.0023730 | -1.5741131 | 0.0072217 | 0.0835575 | -1.7345911 | 0.0031102 | 0.0961993 | -0.1604780 | 0.8446550 | 0.9929048 | -2.4617079 | -0.8469964 | -2.7165316 | -0.4316946 | -2.8757193 | -0.5934630 |
| GGB45432_SGB63101|SGB63101 | 0.7232262 | 0.0088808 | 0.0181874 | -0.7958911 | 0.0040635 | 0.0368175 | -0.8984331 | 0.0212966 | 0.1415041 | -0.6933491 | 0.0742736 | 0.3998957 | 0.2050839 | 0.7077520 | 0.9929048 | -1.3350948 | -0.2566874 | -1.6614131 | -0.1354531 | -1.4554673 | 0.0687691 |
| GGB2653_SGB3574|SGB3574 | 0.0113250 | 0.9753715 | 0.9753715 | -1.4691557 | 0.0000921 | 0.0023730 | -1.8428969 | 0.0004942 | 0.0105548 | -1.0954145 | 0.0358765 | 0.2681768 | 0.7474824 | 0.3090608 | 0.9724308 | -2.1924343 | -0.7458771 | -2.8663451 | -0.8194487 | -2.1177067 | -0.0731223 |
| Phocea_massiliensis|SGB14837 | 0.4892877 | 0.1130213 | 0.1698159 | -1.1101840 | 0.0004001 | 0.0058102 | -1.0621671 | 0.0155770 | 0.1164377 | -1.1582010 | 0.0083931 | 0.1348712 | -0.0960339 | 0.8759362 | 0.9929048 | -1.7165656 | -0.5038025 | -1.9202046 | -0.2041296 | -2.0152693 | -0.3011327 |
| Lachnospiraceae_bacterium|SGB4953 | -1.0110542 | 0.0196470 | 0.0376567 | 1.8996297 | 0.0000175 | 0.0007650 | 2.0042382 | 0.0011843 | 0.0208299 | 1.7950211 | 0.0035391 | 0.0961993 | -0.2092171 | 0.8076874 | 0.9929048 | 1.0523587 | 2.7469007 | 0.8053392 | 3.2031372 | 0.5974763 | 2.9925660 |
| Holdemania_filiformis|SGB4046 | -0.8713113 | 0.0335927 | 0.0590836 | -1.2995237 | 0.0016731 | 0.0185275 | -2.2053986 | 0.0001804 | 0.0079526 | -0.3936488 | 0.4942922 | 0.8905572 | 1.8117499 | 0.0272454 | 0.5717111 | -2.1023519 | -0.4966955 | -3.3414106 | -1.0693867 | -1.5283776 | 0.7410800 |
| Anaeromassilibacillus_senegalensis|SGB14894 | 1.0587520 | 0.0067348 | 0.0142817 | -1.2016868 | 0.0021720 | 0.0223943 | -1.3537201 | 0.0141471 | 0.1114726 | -1.0496536 | 0.0559301 | 0.3474891 | 0.3040665 | 0.6939733 | 0.9929048 | -1.9633423 | -0.4400314 | -2.4314720 | -0.2759681 | -2.1261882 | 0.0268810 |
| Ruminococcus_sp_AF41_9|SGB25497 | -0.8299026 | 0.0411850 | 0.0707719 | 1.7828783 | 0.0000179 | 0.0007650 | 2.7832380 | 0.0000025 | 0.0001522 | 0.7825187 | 0.1716609 | 0.6497039 | -2.0007193 | 0.0141307 | 0.5281335 | 0.9865895 | 2.5791671 | 1.6564794 | 3.9099965 | -0.3429672 | 1.9080046 |
| Eubacterium_ramulus|SGB4959 | -2.2432627 | 0.0000000 | 0.0000003 | -1.3014646 | 0.0010277 | 0.0124560 | -1.5154793 | 0.0066098 | 0.0835575 | -1.0874499 | 0.0497902 | 0.3258371 | 0.4280295 | 0.5832078 | 0.9929048 | -2.0701289 | -0.5328003 | -2.6031489 | -0.4278097 | -2.1738909 | -0.0010088 |
| Youxingia_wuxianensis|SGB82503 | 0.2836651 | 0.4199975 | 0.5187690 | -1.0654123 | 0.0027893 | 0.0278003 | -1.7207509 | 0.0006765 | 0.0126413 | -0.4100738 | 0.4095044 | 0.8055383 | 1.3106771 | 0.0635956 | 0.7486670 | -1.7582618 | -0.3725628 | -2.7011416 | -0.7403602 | -1.3893571 | 0.5692096 |
| Blautia_massiliensis|SGB4826 | -3.9054717 | 0.0000000 | 0.0000000 | -1.7487130 | 0.0019052 | 0.0203449 | -1.6636803 | 0.0353443 | 0.2054116 | -1.8337457 | 0.0204108 | 0.2087421 | -0.1700654 | 0.8782097 | 0.9929048 | -2.8427577 | -0.6546682 | -3.2117672 | -0.1155933 | -3.3800841 | -0.2874073 |
| GGB9765_SGB15382|SGB15382 | 0.9724670 | 0.0052880 | 0.0115410 | -1.0403415 | 0.0028965 | 0.0279370 | -0.6868283 | 0.1601205 | 0.4949617 | -1.3938547 | 0.0046947 | 0.1169756 | -0.7070263 | 0.3055987 | 0.9724308 | -1.7195795 | -0.3611035 | -1.6479586 | 0.2743020 | -2.3538993 | -0.4338100 |
| GGB9770_SGB15390|SGB15390 | 1.1393138 | 0.0031158 | 0.0073356 | -1.1023969 | 0.0042003 | 0.0369375 | -1.1048957 | 0.0413047 | 0.2093238 | -1.0998980 | 0.0419905 | 0.2853448 | 0.0049977 | 0.9947562 | 0.9950745 | -1.8520565 | -0.3527372 | -2.1656735 | -0.0441180 | -2.1594776 | -0.0403184 |
| Ruminococcus_torques|SGB4608 | -2.5450549 | 0.0000002 | 0.0000009 | -2.9967166 | 0.0000000 | 0.0000002 | -4.0037248 | 0.0000000 | 0.0000008 | -1.9897083 | 0.0028797 | 0.0961993 | 2.0140166 | 0.0318587 | 0.5717111 | -3.9152587 | -2.0781744 | -5.3034734 | -2.7039763 | -3.2879887 | -0.6914278 |
| Agathobaculum_butyriciproducens|SGB14991 | -0.2799754 | 0.5736608 | 0.6497143 | 1.8298077 | 0.0003116 | 0.0049041 | 1.7918165 | 0.0116982 | 0.1114726 | 1.8677989 | 0.0085704 | 0.1348712 | 0.0759824 | 0.9391106 | 0.9929048 | 0.8492138 | 2.8104015 | 0.4042640 | 3.1793691 | 0.4818136 | 3.2537841 |
| Clostridium_SGB48024|SGB48024 | 0.3495793 | 0.4685239 | 0.5603545 | 1.8824245 | 0.0001341 | 0.0027026 | 2.0858928 | 0.0025593 | 0.0425134 | 1.6789563 | 0.0145882 | 0.1671720 | -0.4069364 | 0.6729259 | 0.9929048 | 0.9323470 | 2.8325021 | 0.7415212 | 3.4302643 | 0.3361033 | 3.0218094 |
| GGB9531_SGB14932|SGB14932 | 0.6741149 | 0.0481296 | 0.0812355 | -1.2220355 | 0.0004081 | 0.0058102 | -1.0267774 | 0.0335676 | 0.2007343 | -1.4172937 | 0.0035162 | 0.0961993 | -0.3905163 | 0.5648881 | 0.9929048 | -1.8905380 | -0.5535330 | -1.9727167 | -0.0808380 | -2.3621646 | -0.4724228 |
| Oscillibacter_valericigenes|SGB15076 | 2.2693025 | 0.0000001 | 0.0000007 | -1.5455775 | 0.0002310 | 0.0038371 | -0.3939028 | 0.4984511 | 0.7655611 | -2.6972522 | 0.0000068 | 0.0020399 | -2.3033494 | 0.0056168 | 0.3358826 | -2.3558148 | -0.7353402 | -1.5403986 | 0.7525931 | -3.8424531 | -1.5520513 |
| GGB9616_SGB15052|SGB15052 | 0.1922882 | 0.6243555 | 0.6939863 | 1.5604516 | 0.0001032 | 0.0023730 | 1.3778329 | 0.0139910 | 0.1114726 | 1.7430703 | 0.0019661 | 0.0961993 | 0.3652374 | 0.6418749 | 0.9929048 | 0.7865111 | 2.3343920 | 0.2826975 | 2.4729683 | 0.6491719 | 2.8369687 |
| GGB58158_SGB79798|SGB79798 | -0.0395133 | 0.9036721 | 0.9285153 | -0.9249820 | 0.0051278 | 0.0414382 | -1.2410264 | 0.0078830 | 0.0858020 | -0.6089377 | 0.1881529 | 0.6772633 | 0.6320887 | 0.3337315 | 0.9724308 | -1.5687010 | -0.2812631 | -2.1518967 | -0.3301560 | -1.5187792 | 0.3009038 |
| Enterocloster_hominis|SGB4721 | -0.4611836 | 0.2485952 | 0.3190128 | -1.6735473 | 0.0000436 | 0.0014488 | -2.1154645 | 0.0002425 | 0.0080565 | -1.2316301 | 0.0301117 | 0.2433353 | 0.8838345 | 0.2688329 | 0.9724308 | -2.4600398 | -0.8870548 | -3.2283612 | -1.0025678 | -2.3432697 | -0.1199904 |
| Streptococcus_thermophilus|SGB8002 | -1.9892533 | 0.0000000 | 0.0000001 | -2.6523655 | 0.0000000 | 0.0000000 | -4.2589263 | 0.0000000 | 0.0000000 | -1.0458046 | 0.0256343 | 0.2433353 | 3.2131217 | 0.0000024 | 0.0007185 | -3.3010343 | -2.0036966 | -5.1768008 | -3.3410518 | -1.9626424 | -0.1289669 |
| Blautia_obeum|SGB4810 | -1.2227533 | 0.0022206 | 0.0055796 | 1.8263749 | 0.0000071 | 0.0005271 | 2.8583481 | 0.0000008 | 0.0000597 | 0.7944017 | 0.1549579 | 0.6177656 | -2.0639464 | 0.0095315 | 0.4749853 | 1.0496396 | 2.6031103 | 1.7592579 | 3.9574383 | -0.3034471 | 1.8922505 |
| GGB51884_SGB49168|SGB49168 | 0.1489066 | 0.6616867 | 0.7247045 | 1.2130993 | 0.0004677 | 0.0063559 | 1.3067445 | 0.0072659 | 0.0835575 | 1.1194540 | 0.0209440 | 0.2087421 | -0.1872905 | 0.7831313 | 0.9929048 | 0.5423556 | 1.8838430 | 0.3576339 | 2.2558552 | 0.1714154 | 2.0674927 |
| Roseburia_lenta|SGB4957 | -1.4433630 | 0.0000086 | 0.0000362 | 1.7438636 | 0.0000001 | 0.0000112 | 2.9096930 | 0.0000000 | 0.0000001 | 0.5780342 | 0.1946353 | 0.6772633 | -2.3316588 | 0.0002817 | 0.0421119 | 1.1237786 | 2.3639486 | 2.0322650 | 3.7871210 | -0.2984027 | 1.4544712 |
| GGB3288_SGB4342|SGB4342 | -0.3043170 | 0.4247255 | 0.5187690 | 1.4470612 | 0.0002004 | 0.0035245 | 1.9437150 | 0.0004045 | 0.0100784 | 0.9504073 | 0.0788960 | 0.4138580 | -0.9933077 | 0.1933818 | 0.9594504 | 0.6961415 | 2.1979808 | 0.8811543 | 3.0062757 | -0.1109532 | 2.0117678 |
| GGB9574_SGB14987|SGB14987 | 0.6979519 | 0.0760440 | 0.1235716 | 1.1288124 | 0.0044098 | 0.0376722 | 0.8497545 | 0.1264137 | 0.4308016 | 1.4078702 | 0.0117570 | 0.1671720 | 0.5581157 | 0.4763145 | 0.9929048 | 0.3569149 | 1.9007099 | -0.2424901 | 1.9419991 | 0.3168594 | 2.4988811 |
| Clostridiaceae_unclassified_SGB4771|SGB4771 | -0.0888094 | 0.8522123 | 0.8878449 | 2.1375079 | 0.0000134 | 0.0007650 | 1.8328628 | 0.0072095 | 0.0835575 | 2.4421530 | 0.0003793 | 0.0567018 | 0.6092902 | 0.5230257 | 0.9929048 | 1.1976428 | 3.0773730 | 0.5029420 | 3.1627836 | 1.1137344 | 3.7705716 |
| Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | -0.6037329 | 0.1247910 | 0.1811286 | 1.6753589 | 0.0000317 | 0.0011836 | 2.1178834 | 0.0001862 | 0.0079526 | 1.2328344 | 0.0271771 | 0.2433353 | -0.8850489 | 0.2597427 | 0.9724308 | 0.9026890 | 2.4480288 | 1.0245459 | 3.2112209 | 0.1407319 | 2.3249370 |
| GGB9616_SGB15051|SGB15051 | 0.0491406 | 0.9109482 | 0.9296024 | 1.7151507 | 0.0001356 | 0.0027026 | 1.5312508 | 0.0146741 | 0.1125016 | 1.8990506 | 0.0025687 | 0.0961993 | 0.3677999 | 0.6756214 | 0.9929048 | 0.8488687 | 2.5814327 | 0.3054509 | 2.7570507 | 0.6746353 | 3.1234660 |
Open code
if(file.exists('gitignore/result_microbiome_SGB30.csv') == FALSE){
write.table(result_microbiome,
'gitignore/result_microbiome_SGB30.csv', row.names = FALSE)
}5 Elastic net
To assess the predictive power of microbiome features to discriminate between diet strategy, we employed Elastic Net logistic regression.
As we expected very high level of co-linearity, we allowed \(alpha\) to rather small (0, 0.2 or 0.4). All features were standardized by 2 standard deviations.
The performance of the predictive models was evaluated through their capacity of discriminate between vegan and omnivore diets, using out-of-sample area under ROC curve (AUC; estimated with out-of-bag bootstrap) as the measure of discriminatory capacity.
All features were transformed by 2 standard deviations (resulting in standard deviation of 0.5).
5.1 Prepare data for glmnet
Open code
data_microbiome_glmnet <- data_microbiome_original %>%
na.omit() %>%
dplyr::mutate(
vegan = as.numeric(
dplyr::if_else(
Diet == "VEGAN", 1, 0
)
),
dplyr::across(
`Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`,
~ arm::rescale(.)
)
) %>%
dplyr::select(
vegan,
dplyr::everything()
) %>%
dplyr::select(
Sample, vegan,
`Bacteroides_stercoris|SGB1830`:`Clostridiaceae_unclassified_SGB4771|SGB4771`
)5.2 Fit model
Open code
modelac <- "elanet_microbiome_SGB30"
assign(
modelac,
run(
expr = clust_glmnet(
data = data_microbiome_glmnet,
outcome = "vegan",
clust_id = "Sample",
sample_method = "oos_boot",
N = 500,
alphas = c(0, 0.2, 0.4),
family = "binomial",
seed = 478
),
path = paste0("gitignore/run/", modelac)
)
)5.3 Model summary
Open code
elanet_microbiome_SGB30$model_summary
## alpha lambda auc auc_OutOfSample auc_oos_CIL auc_oos_CIU accuracy
## 1 0.2 0.05294768 0.9949201 0.887315 0.8047796 0.9523366 0.9698795
## accuracy_OutOfSample accuracy_oos_CIL accuracy_oos_CIU
## 1 0.7973382 0.6949153 0.88333335.4 Calibration plot
Open code
elanet_microbiome_SGB30$plot5.5 Estimated coefficients
Open code
data.frame(
microbiome = row.names(
elanet_microbiome_SGB30$betas
)[
which(
abs(
elanet_microbiome_SGB30$betas
)>0
)
],
beta = elanet_microbiome_SGB30$betas[
abs(
elanet_microbiome_SGB30$betas
)>0
]
) %>%
mutate(
is_in_ExtValCoh = if_else(
microbiome %in% names(data_microbiome_validation),
1, 0
)
)
## microbiome beta
## 1 (Intercept) 0.511579554
## 2 Alistipes_putredinis|SGB2318 -0.035673015
## 3 GGB9602_SGB15031|SGB15031 -0.072598650
## 4 Phocaeicola_vulgatus|SGB1814 0.055271158
## 5 Sutterella_wadsworthensis|SGB9283 0.048565620
## 6 Parabacteroides_distasonis|SGB1934 0.077620832
## 7 GGB47687_SGB2286|SGB2286 -0.034436116
## 8 Brotolimicola_acetigignens|SGB4914 0.014966224
## 9 Faecalibacterium_prausnitzii|SGB15339 0.095440885
## 10 Faecalibacterium_SGB15346|SGB15346 0.007147743
## 11 Dialister_invisus|SGB5825_group 0.049768508
## 12 GGB33469_SGB15236|SGB15236 -0.042846768
## 13 Bacteroides_eggerthii|SGB1829 -0.176302045
## 14 Escherichia_coli|SGB10068 -0.060276967
## 15 GGB3175_SGB4191|SGB4191 -0.114603501
## 16 Oscillospiraceae_bacterium_CLA_AA_H250|SGB14861 -0.026277500
## 17 GGB9715_SGB15267|SGB15267 0.027979659
## 18 Lachnospira_pectinoschiza|SGB5075 -0.197650618
## 19 Bacteroides_clarus|SGB1832 -0.283159299
## 20 Anaerotignum_faecicola|SGB5190 -0.151027161
## 21 Hydrogenoanaerobacterium_saccharovorans|SGB15350 -0.216568806
## 22 Bacteroides_faecis|SGB1860 0.047622592
## 23 Roseburia_hominis|SGB4936 0.059440357
## 24 GGB9712_SGB15244|SGB15244 -0.010773625
## 25 Odoribacter_splanchnicus|SGB1790 -0.013240831
## 26 Bacteroides_ovatus|SGB1871 0.189828975
## 27 GGB9614_SGB15049|SGB15049 0.070090055
## 28 Barnesiella_intestinihominis|SGB1965 -0.081868455
## 29 Agathobaculum_butyriciproducens|SGB14993 0.095132263
## 30 GGB6649_SGB9391|SGB9391 0.206425435
## 31 Ruminococcus_bromii|SGB4285 0.074576850
## 32 Lawsonibacter_asaccharolyticus|SGB15154 -0.128272502
## 33 Phocaeicola_massiliensis|SGB1812 -0.001813712
## 34 Clostridium_fessum|SGB4705 -0.228912823
## 35 Clostridium_sp_AF36_4|SGB4644 0.131716733
## 36 GGB9627_SGB15081|SGB15081 0.026856354
## 37 GGB9707_SGB15229|SGB15229 0.154701269
## 38 Roseburia_inulinivorans|SGB4940 -0.028793525
## 39 Phocaeicola_dorei|SGB1815 -0.055172313
## 40 GGB9502_SGB14899|SGB14899 -0.081081523
## 41 Lachnospiraceae_bacterium_AM48_27BH|SGB4706 0.132798757
## 42 GGB9619_SGB15067|SGB15067 -0.100259747
## 43 GGB52130_SGB14966|SGB14966 -0.008609615
## 44 Blautia_sp_MCC283|SGB4828 0.021484960
## 45 GGB9509_SGB14906|SGB14906 -0.251684291
## 46 GGB45432_SGB63101|SGB63101 -0.052225983
## 47 GGB3653_SGB4964|SGB4964 0.034684642
## 48 GGB2653_SGB3574|SGB3574 -0.174495506
## 49 GGB33512_SGB15203|SGB15203 -0.009922730
## 50 Lachnospiraceae_bacterium|SGB4781 -0.086928748
## 51 Phocea_massiliensis|SGB14837 -0.151971553
## 52 Guopingia_tenuis|SGB14127 -0.105963051
## 53 Lachnospiraceae_bacterium|SGB4953 0.192602465
## 54 Blautia_SGB4815|SGB4815 -0.126582192
## 55 Holdemania_filiformis|SGB4046 -0.165864895
## 56 Anaeromassilibacillus_senegalensis|SGB14894 -0.103673015
## 57 Lachnospira_pectinoschiza|SGB5089 0.080199944
## 58 Ruminococcus_sp_AF41_9|SGB25497 0.297378148
## 59 Eubacterium_ramulus|SGB4959 -0.218176897
## 60 Enterocloster_lavalensis|SGB4725 0.137788476
## 61 Roseburia_sp_AF02_12|SGB4938 0.045941095
## 62 Youxingia_wuxianensis|SGB82503 -0.129585266
## 63 Intestinimonas_gabonensis|SGB79840 0.037774041
## 64 Blautia_massiliensis|SGB4826 -0.022144258
## 65 GGB9765_SGB15382|SGB15382 -0.183451046
## 66 GGB9770_SGB15390|SGB15390 -0.022251917
## 67 Clostridium_sp_AM49_4BH|SGB4652 0.007005792
## 68 Coprobacter_secundus|SGB1962 -0.046341779
## 69 GGB3510_SGB4687|SGB4687 0.129324333
## 70 Mediterraneibacter_faecis|SGB4563 -0.215122996
## 71 Ruminococcus_torques|SGB4608 -0.377079932
## 72 Coprococcus_comes|SGB4577 -0.050397204
## 73 Eubacterium_ventriosum|SGB5045 0.138418574
## 74 GGB9758_SGB15368|SGB15368 -0.055516524
## 75 GGB3304_SGB4367|SGB4367 0.052255707
## 76 Agathobaculum_butyriciproducens|SGB14991 0.168614919
## 77 GGB9296_SGB14253|SGB14253 0.066107880
## 78 Clostridium_sp_AF12_28|SGB4715 -0.003328052
## 79 Clostridium_SGB48024|SGB48024 0.063351217
## 80 Slackia_isoflavoniconvertens|SGB14773 0.025434510
## 81 Anaerotruncus_massiliensis|SGB14965 -0.060526070
## 82 GGB9531_SGB14932|SGB14932 -0.266885285
## 83 Oscillibacter_valericigenes|SGB15076 -0.241013258
## 84 Anaerostipes_hadrus|SGB4547 0.029184007
## 85 GGB9616_SGB15052|SGB15052 0.086803074
## 86 GGB9524_SGB14924|SGB14924 0.055832021
## 87 Segatella_copri|SGB1626 0.087698990
## 88 Mediterraneibacter_butyricigenes|SGB25493 0.095770164
## 89 Eubacteriaceae_bacterium|SGB3958 -0.001895835
## 90 Enterocloster_hominis|SGB4721 -0.228002405
## 91 Blautia_luti|SGB4832 0.010659076
## 92 Blautia_sp_OF03_15BH|SGB4779 -0.036148950
## 93 Streptococcus_thermophilus|SGB8002 -0.564128139
## 94 Roseburia_hominis|SGB4659 0.032427335
## 95 Evtepia_gabavorous|SGB15120 0.028849861
## 96 Blautia_obeum|SGB4810 0.273885058
## 97 GGB3480_SGB4648|SGB4648 -0.049527836
## 98 Anaerobutyricum_soehngenii|SGB4537 0.027348622
## 99 Bacteroides_xylanisolvens|SGB1867 0.032899455
## 100 GGB9708_SGB15233|SGB15233 0.127259201
## 101 GGB51884_SGB49168|SGB49168 0.167279927
## 102 Veillonella_atypica|SGB6936 0.001449529
## 103 Enterocloster_bolteae|SGB4758 0.013936034
## 104 Roseburia_lenta|SGB4957 0.334732607
## 105 GGB3288_SGB4342|SGB4342 0.229783890
## 106 GGB9574_SGB14987|SGB14987 0.253072850
## 107 Eubacterium_sp_AF34_35BH|SGB5051 -0.048866235
## 108 Clostridiaceae_unclassified_SGB4771|SGB4771 0.264389510
## is_in_ExtValCoh
## 1 0
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
## 7 1
## 8 1
## 9 1
## 10 1
## 11 1
## 12 1
## 13 1
## 14 1
## 15 1
## 16 1
## 17 1
## 18 1
## 19 1
## 20 1
## 21 1
## 22 1
## 23 1
## 24 1
## 25 1
## 26 1
## 27 1
## 28 1
## 29 1
## 30 1
## 31 1
## 32 1
## 33 1
## 34 1
## 35 1
## 36 1
## 37 1
## 38 1
## 39 1
## 40 1
## 41 1
## 42 1
## 43 1
## 44 1
## 45 1
## 46 1
## 47 1
## 48 1
## 49 1
## 50 1
## 51 1
## 52 1
## 53 1
## 54 1
## 55 1
## 56 1
## 57 1
## 58 1
## 59 1
## 60 1
## 61 1
## 62 1
## 63 1
## 64 1
## 65 1
## 66 1
## 67 1
## 68 1
## 69 1
## 70 1
## 71 1
## 72 1
## 73 1
## 74 1
## 75 1
## 76 1
## 77 1
## 78 1
## 79 1
## 80 1
## 81 1
## 82 1
## 83 1
## 84 1
## 85 1
## 86 1
## 87 1
## 88 1
## 89 1
## 90 1
## 91 1
## 92 1
## 93 1
## 94 1
## 95 1
## 96 1
## 97 1
## 98 1
## 99 1
## 100 1
## 101 1
## 102 1
## 103 1
## 104 1
## 105 1
## 106 1
## 107 1
## 108 15.6 Plot beta coefficients
Open code
elacoef <- data.frame(
microbiome = row.names(elanet_microbiome_SGB30$betas),
beta_ela = elanet_microbiome_SGB30$betas[, 1]
) %>%
arrange(abs(beta_ela)) %>%
filter(abs(beta_ela) > 0,
!grepl('Intercept', microbiome)) %>%
mutate(microbiome = factor(microbiome, levels = microbiome))
plotac <- "elanet_beta_microbiome_SGB30"
path <- "gitignore/figures"
assign(plotac,
ggplot(elacoef,
aes(
x = microbiome,
y = beta_ela
)
) +
geom_point() +
geom_hline(yintercept = 0, color = "black") +
labs(
y = "Standardized beta coefficients",
x = "Bacteria species"
) +
theme_minimal() +
coord_flip() +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 7,
height = 14
)
}6 External validation
External validation was performed with an independent Czech cohort.
As a first step, we will use the previously developed and internally validated elastic net model to predict vegan status in the independent Czech cohort. The validation data will be standardized using the mean and standard deviation of each taxa as taken from the training cohort to ensure comparability across datasets. For each subject in the external validation cohort, we will estimate the predicted probability of being vegan using the elastic net model. This predicted probability will then be used as a variable to discriminate between the diet groups in the independent cohort.
In a 2nd step, we will look at taxa that significantly differed between diet groups (average vegan diet effect across both countries, FDR < 0.05) estimated by linear models (one per a taxa) with data of training cohort. Then we will fit linear models also for external validation cohort. Effect of vegan diet on these taxa will be shown along with 95% confidence interval for all cohorts: training Czech and Italian cohorts, but also in Czech independent (validating) cohort
6.1 Prediction of diet (elastic net)
6.1.1 Get table of weights, means and SDs
Open code
coefs_microbiome_all <- get_coef(
original_data = data_analysis,
glmnet_model = elanet_microbiome_SGB30)
coefs_microbiome_all
## # A tibble: 295 × 5
## predictor beta_scaled beta_OrigScale mean SD
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.512 NA NA NA
## 2 Bacteroides_stercoris|SGB1830 0 0 0.565 4.53
## 3 Alistipes_putredinis|SGB2318 -0.0357 -0.276 4.70 3.87
## 4 Candidatus_Cibionibacter_quicibialis… 0 0 5.11 2.15
## 5 Bacteroides_uniformis|SGB1836 0 0 5.96 2.01
## 6 Eubacterium_siraeum|SGB4198 0 0 -1.27 3.66
## 7 GGB9602_SGB15031|SGB15031 -0.0726 -0.484 0.576 3.33
## 8 Phocaeicola_vulgatus|SGB1814 0.0553 0.291 5.48 2.63
## 9 Faecalibacterium_prausnitzii|SGB15316 0 0 5.08 2.09
## 10 Lachnospiraceae_bacterium_CLA_AA_H24… 0 0 3.30 1.72
## # ℹ 285 more rows6.1.3 Standardize data in validation set
Open code
data_microbiome_validation_pred_all <- data_microbiome_validation %>%
dplyr::mutate(
vegan = if_else(
Diet == "VEGAN", 1, 0
)
) %>%
dplyr::select(
vegan,
dplyr::all_of(common_predictors)
) %>%
dplyr::mutate(
across(
.cols = -vegan,
.fns = ~ .
- coefs_microbiome_all$mean[
match(
cur_column(),
coefs_microbiome_all$predictor
)
]
)
) %>%
dplyr::mutate(
across(
.cols = -vegan,
.fns = ~ .
/ coefs_microbiome_all$SD[
match(
cur_column(),
coefs_microbiome_all$predictor
)
]
)
) 6.1.4 Result
Open code
elanet_microbiome_SGB30$fit
##
## Call: glmnet::glmnet(x = original_predictors, y = original_outcome, family = family, alpha = optim_par$alpha[1], lambda = optim_par$lamb_1se[1], standardize = standardize)
##
## Df %Dev Lambda
## 1 107 60 0.05295
ggroc
## function (data, ...)
## {
## UseMethod("ggroc")
## }
## <bytecode: 0x6385aa1381a0>
## <environment: namespace:pROC>
newx <- as.matrix(data_microbiome_validation_pred_all[,-1])
predicted <- predict(
elanet_microbiome_SGB30$fit,
newx = newx)
tr <- data_microbiome_validation_pred_all %>%
dplyr::mutate(
predicted_logit = as.numeric(
predict(
elanet_microbiome_SGB30$fit,
newx = newx
)
)
) %>%
dplyr::mutate(
predicted = inv_logit(predicted_logit)
)
roc_microbiome_all <- pROC::roc(
vegan ~ predicted_logit,
data = tr,
direction = "<",
levels = c(0, 1),
ci = TRUE
)
roc_microbiome_all
##
## Call:
## roc.formula(formula = vegan ~ predicted_logit, data = tr, direction = "<", levels = c(0, 1), ci = TRUE)
##
## Data: predicted_logit in 43 controls (vegan 0) < 59 cases (vegan 1).
## Area under the curve: 0.8723
## 95% CI: 0.795-0.9496 (DeLong)
plotac <- "roc_microbiom_SGB30"
path <- "gitignore/figures"
assign(plotac, ggroc(roc_microbiome_all))
get(plotac)Open code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 6,
height = 4.5
)
}6.2 Diet effect across datasets
Similarly as in training data cohorts, we will fit linear model per each of the selected taxa (\(CLR\) - transformed), with a single fixed effect factor of diet.
6.2.1 Linear models in validation cohort
Open code
diet_sensitive_taxa <- result_microbiome %>%
filter(fdr_VGdiet_avg < 0.05) %>%
pull(outcome)
len <- length(diet_sensitive_taxa)
data_analysis_microbiome <- data_microbiome_validation %>%
dplyr::mutate(
Diet_VEGAN = as.numeric(
dplyr::if_else(
Diet == 'VEGAN', 1, 0
)
)
) %>%
dplyr::select(
Diet_VEGAN,
all_of(diet_sensitive_taxa)
)6.2.1.1 Define number of microbiome and covariates
Open code
n_covarites <- 1
n_features <- ncol(data_analysis_microbiome) - n_covarites6.2.1.2 Create empty objects
Open code
outcome <- vector('double', n_features)
logFD_VGdiet <- vector('double', n_features)
P_VGdiet <- vector('double', n_features)
CI_L_VGdiet <- vector('double', n_features)
CI_U_VGdiet <- vector('double', n_features)6.2.1.3 Estimate over outcomes
Open code
for (i in 1:n_features) {
## define variable
data_analysis_microbiome$outcome <- data_analysis_microbiome[, (i + n_covarites)]
## fit model
model <- lm(outcome ~ Diet_VEGAN, data = data_analysis_microbiome)
## save results
outcome[i] <- names(data_analysis_microbiome)[i + n_covarites]
## diet effect
tr <- confint(model)
CI_L_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][1]
CI_U_VGdiet[i] <- tr[which(row.names(tr) == "Diet_VEGAN"), ][2]
logFD_VGdiet[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 1
]
P_VGdiet[i] <- summary(model)$coefficients[
which(
names(model$coefficients) == "Diet_VEGAN"
), 4
]
}6.2.1.4 Results table
Open code
result_microbiome_val <- data.frame(
outcome,
logFD_VGdiet, P_VGdiet,
CI_L_VGdiet, CI_U_VGdiet
)
kableExtra::kable(result_microbiome_val,
caption = "Results of linear models estimating the effect of diet on CLR-trasformed taxa proportions. Only bacteria that significantly differed between diet groups in training cohorts (FDR < 0.05, average effect across both training cohorts) were included. `logFD` represents the estimated effects (regression coefficient), indicating how much the CLR-transformed taxa count differ between vegans and omnivores. `P`: p-value, `fdr`: p-value adjusted for multiple comparisons, and `CI_L` and `CI_U` represent the lower and upper bounds of the 95% confidence interval, respectively. All estimates in a single row are based on a single model."
)| outcome | logFD_VGdiet | P_VGdiet | CI_L_VGdiet | CI_U_VGdiet |
|---|---|---|---|---|
| Escherichia_coli|SGB10068 | -0.0678001 | 0.8780534 | -0.9422270 | 0.8066267 |
| Bacteroides_clarus|SGB1832 | -0.4474756 | 0.1501145 | -1.0596258 | 0.1646745 |
| Hydrogenoanaerobacterium_saccharovorans|SGB15350 | 0.0343774 | 0.8840093 | -0.4319367 | 0.5006916 |
| GGB6649_SGB9391|SGB9391 | 0.6707753 | 0.0619526 | -0.0341736 | 1.3757242 |
| Ruthenibacterium_lactatiformans|SGB15271 | -0.5139319 | 0.1557949 | -1.2268892 | 0.1990254 |
| Lawsonibacter_asaccharolyticus|SGB15154 | -2.0366512 | 0.0000022 | -2.8405092 | -1.2327932 |
| Clostridium_fessum|SGB4705 | 0.0036823 | 0.9877139 | -0.4695508 | 0.4769153 |
| GGB9707_SGB15229|SGB15229 | -0.2846691 | 0.5637032 | -1.2596309 | 0.6902926 |
| Lachnospiraceae_bacterium_AM48_27BH|SGB4706 | 0.2944198 | 0.1109060 | -0.0687547 | 0.6575944 |
| GGB9509_SGB14906|SGB14906 | -0.5158608 | 0.1075919 | -1.1461894 | 0.1144678 |
| GGB45432_SGB63101|SGB63101 | -0.3814408 | 0.3381234 | -1.1676982 | 0.4048166 |
| GGB2653_SGB3574|SGB3574 | -0.7110515 | 0.0637825 | -1.4636353 | 0.0415322 |
| Phocea_massiliensis|SGB14837 | -0.2249475 | 0.5453208 | -0.9603579 | 0.5104629 |
| Lachnospiraceae_bacterium|SGB4953 | 1.1093932 | 0.1054861 | -0.2379349 | 2.4567214 |
| Holdemania_filiformis|SGB4046 | -0.7909502 | 0.0051249 | -1.3392099 | -0.2426906 |
| Anaeromassilibacillus_senegalensis|SGB14894 | 0.1582383 | 0.6873362 | -0.6195266 | 0.9360031 |
| Ruminococcus_sp_AF41_9|SGB25497 | 1.5723968 | 0.0046260 | 0.4956389 | 2.6491548 |
| Eubacterium_ramulus|SGB4959 | -0.7821065 | 0.0023325 | -1.2787575 | -0.2854555 |
| Youxingia_wuxianensis|SGB82503 | -0.7728454 | 0.0336176 | -1.4845820 | -0.0611087 |
| Blautia_massiliensis|SGB4826 | -0.3538921 | 0.3462847 | -1.0958595 | 0.3880752 |
| GGB9765_SGB15382|SGB15382 | -0.3380327 | 0.4376177 | -1.1985609 | 0.5224954 |
| GGB9770_SGB15390|SGB15390 | -0.2149847 | 0.5654590 | -0.9546180 | 0.5246486 |
| Ruminococcus_torques|SGB4608 | -1.9375993 | 0.0002924 | -2.9616396 | -0.9135591 |
| Agathobaculum_butyriciproducens|SGB14991 | 0.4691300 | 0.2780519 | -0.3842585 | 1.3225185 |
| Clostridium_SGB48024|SGB48024 | 0.4767405 | 0.2762444 | -0.3872161 | 1.3406970 |
| GGB9531_SGB14932|SGB14932 | -0.6312664 | 0.0872877 | -1.3565383 | 0.0940056 |
| Oscillibacter_valericigenes|SGB15076 | -0.2233062 | 0.4190206 | -0.7692669 | 0.3226545 |
| GGB9616_SGB15052|SGB15052 | 0.9145023 | 0.0161248 | 0.1732221 | 1.6557826 |
| GGB58158_SGB79798|SGB79798 | -0.5931442 | 0.1208917 | -1.3453877 | 0.1590993 |
| Enterocloster_hominis|SGB4721 | -2.5838231 | 0.0000001 | -3.4848975 | -1.6827488 |
| Streptococcus_thermophilus|SGB8002 | -1.9051306 | 0.0000122 | -2.7259594 | -1.0843017 |
| Blautia_obeum|SGB4810 | 2.0476532 | 0.0000342 | 1.1115125 | 2.9837939 |
| GGB51884_SGB49168|SGB49168 | 0.1576822 | 0.5954702 | -0.4296599 | 0.7450243 |
| Roseburia_lenta|SGB4957 | 1.9623313 | 0.0003619 | 0.9080098 | 3.0166528 |
| GGB3288_SGB4342|SGB4342 | 0.1719754 | 0.6886263 | -0.6770096 | 1.0209604 |
| GGB9574_SGB14987|SGB14987 | 0.8228106 | 0.0037310 | 0.2731300 | 1.3724911 |
| Clostridiaceae_unclassified_SGB4771|SGB4771 | 0.3951355 | 0.3029642 | -0.3619755 | 1.1522465 |
| Lachnospiraceae_bacterium_OM04_12BH|SGB4893 | -0.3275102 | 0.5099089 | -1.3100021 | 0.6549817 |
| GGB9616_SGB15051|SGB15051 | 0.5125892 | 0.1770983 | -0.2355437 | 1.2607221 |
Open code
if (file.exists("gitignore/result_microbiom_validation_SGB30.csv") == FALSE) {
write.table(result_microbiome_val,
"gitignore/result_microbiom_validation_SGB30.csv",
row.names = FALSE
)
}6.2.2 Forest plot
6.2.2.1 Prepare data
Open code
## subset result tables
result_microbiome_subset <- result_microbiome %>%
filter(outcome %in% diet_sensitive_taxa)
result_microbiome_val_subset <- result_microbiome_val %>%
filter(outcome %in% diet_sensitive_taxa)
## create a data frame
data_forest <- data.frame(
outcome = rep(diet_sensitive_taxa, 3),
beta = c(
result_microbiome_subset$logFD_VGdiet_inCZ,
result_microbiome_subset$logFD_VGdiet_inIT,
result_microbiome_val_subset$logFD_VGdiet
),
lower = c(
result_microbiome_subset$CI_L_VGdiet_inCZ,
result_microbiome_subset$CI_L_VGdiet_inIT,
result_microbiome_val_subset$CI_L_VGdiet
),
upper = c(
result_microbiome_subset$CI_U_VGdiet_inCZ,
result_microbiome_subset$CI_U_VGdiet_inIT,
result_microbiome_val_subset$CI_U_VGdiet
),
dataset = c(
rep("CZ", len),
rep("IT", len),
rep("Validation", len)
)
)
## define ordering
validation_order <- data_forest %>%
filter(dataset == "Validation") %>%
arrange(beta) %>%
pull(outcome)
## Define 'winners'
up_winners <- data_forest %>%
pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome),
by = 'outcome') %>%
filter(beta_CZ > 0,
beta_IT > 0,
lower_Validation > 0,
beta_ela > 0.1) %>%
pull(outcome)
down_winners <- data_forest %>%
pivot_wider(names_from = dataset,
values_from = c(beta, lower, upper)) %>%
left_join(elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome),
by = 'outcome') %>%
filter(beta_CZ < 0,
beta_IT < 0,
upper_Validation < 0,
beta_ela < -0.1) %>%
pull(outcome)
winners <- c(up_winners, down_winners)
data_forest <- data_forest %>%
mutate(in_winner = if_else(outcome %in% winners, TRUE, FALSE, missing = FALSE)) %>%
left_join(
elacoef %>% mutate(outcome = microbiome) %>% select(-microbiome),
by = 'outcome') %>%
mutate(outcome = factor(outcome, levels = validation_order))6.2.2.2 Plotting
Open code
plotac <- "forest_plot_microbiome_SGB30"
path <- "gitignore/figures"
colors <- c("CZ" = "#150999", "IT" = "#329243", "Validation" = "grey60")
assign(plotac, ggplot(
data_forest, aes(x = outcome, y = beta, ymin = lower, ymax = upper, color = dataset)
) +
geom_pointrange(position = position_dodge(width = 0.5), size = 0.5) +
geom_hline(yintercept = 0, color = "black") +
geom_errorbar(position = position_dodge(width = 0.5), width = 0.2) +
scale_color_manual(values = colors) +
labs(
y = "Effect of vegan diet on CLR-transformed taxa proportion",
x = "Outcome",
color = "Dataset"
) +
theme_minimal() +
coord_flip() + # Flip coordinates to have outcomes on the y-axis
scale_x_discrete(
labels = setNames(
ifelse(data_forest$in_winner,
paste0("**", data_forest$outcome, "**"),
as.character(data_forest$outcome)
), data_forest$outcome
)
) +
theme(
axis.text.x = element_text(size = 10),
axis.text.y = ggtext::element_markdown(size = 10),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
legend.position = "bottom"
)
)
get(plotac)Diet, Country, and the interaction term Diet:Country as predictors. In the independent Czech validation cohort, Diet was the only fixed-effect predictor. Taxa validated in the linear model and showing predictive power in the elastic net model (|β| > 0.1) are boldOpen code
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 9,
height = 13
)
}6.2.3 Boxplot
Open code
plotac <- "boxplot_microbiome_SGB30"
path <- "gitignore/figures"
colo <- c('#F9FFAF','#329243')
boxplot_cond <- function(variable) {
p <- ggboxplot(data_merged,
x = 'Diet',
y = variable,
fill = 'Diet',
tip.length = 0.15,
palette = colo,
outlier.shape = 1,
lwd = 0.25,
outlier.size = 0.8,
facet.by = 'Data',
title = variable,
ylab = 'CLR(taxa proportion)') +
theme(
plot.title = element_text(size = 10),
axis.title = element_text(size = 8),
axis.text.y = element_text(size = 7),
axis.text.x = element_blank(),
axis.title.x = element_blank()
)
return(p)
}
# Plot all outcomes
plots <- map(diet_sensitive_taxa, boxplot_cond)
# Create a matrix of plots
assign(plotac,
ggarrange(plotlist = plots, ncol = 3, nrow = 3, common.legend = TRUE)
)
get(plotac)
## $`1`##
## $`2`
##
## $`3`
##
## $`4`
##
## $`5`
##
## attr(,"class")
## [1] "list" "ggarrange"
if (file.exists(paste0(path, "/", plotac, ".svg")) == FALSE) {
ggsave(
path = paste0(path),
filename = plotac,
device = "svg",
width = 7,
height = 7
)
}
7 Reproducibility
Open code
sessionInfo()
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=cs_CZ.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=cs_CZ.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=cs_CZ.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=cs_CZ.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Prague
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] vegan_2.6-6.1 lattice_0.22-5 permute_0.9-7
## [4] zCompositions_1.5.0-4 truncnorm_1.0-8 NADA_1.6-1.1
## [7] survival_3.7-0 MicrobiomeStat_1.2 glmnet_4.1-8
## [10] pROC_1.18.0 arm_1.12-2 lme4_1.1-35.5
## [13] Matrix_1.7-0 MASS_7.3-65 car_3.1-2
## [16] carData_3.0-5 emmeans_1.10.4 brms_2.21.0
## [19] Rcpp_1.0.13 rms_6.8-1 Hmisc_5.1-3
## [22] glmmTMB_1.1.9 ggtext_0.1.2 ggdist_3.3.2
## [25] cowplot_1.1.1 ggpubr_0.4.0 sjPlot_2.8.16
## [28] kableExtra_1.4.0 flextable_0.9.6 gtsummary_2.0.2
## [31] compositions_2.0-8 janitor_2.2.0 stringi_1.7.6
## [34] lubridate_1.8.0 forcats_1.0.0 stringr_1.5.1
## [37] dplyr_1.1.4 purrr_1.0.2 readr_2.1.2
## [40] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.1
## [43] tidyverse_1.3.1 readxl_1.3.1 openxlsx_4.2.8
## [46] RJDBC_0.2-10 rJava_1.0-6 DBI_1.1.2
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.4 matrixStats_1.3.0 httr_1.4.2
## [4] insight_0.20.2 numDeriv_2016.8-1.1 tools_4.4.3
## [7] backports_1.5.0 sjlabelled_1.2.0 utf8_1.2.4
## [10] R6_2.5.1 mgcv_1.9-1 withr_3.0.1
## [13] Brobdingnag_1.2-7 prettyunits_1.1.1 gridExtra_2.3
## [16] bayesm_3.1-6 quantreg_5.98 cli_3.6.3
## [19] textshaping_0.3.6 performance_0.12.2 officer_0.6.6
## [22] sandwich_3.0-1 labeling_0.4.2 mvtnorm_1.1-3
## [25] robustbase_0.93-9 polspline_1.1.25 ggridges_0.5.3
## [28] askpass_1.1 QuickJSR_1.3.1 commonmark_1.9.1
## [31] systemfonts_1.0.4 StanHeaders_2.32.10 foreign_0.8-88
## [34] gfonts_0.2.0 svglite_2.1.3 rstudioapi_0.16.0
## [37] httpcode_0.3.0 generics_0.1.3 shape_1.4.6
## [40] distributional_0.4.0 zip_2.2.0 inline_0.3.19
## [43] loo_2.4.1 fansi_1.0.6 abind_1.4-5
## [46] lifecycle_1.0.4 multcomp_1.4-18 yaml_2.3.5
## [49] snakecase_0.11.1 grid_4.4.3 promises_1.2.0.1
## [52] crayon_1.5.0 haven_2.4.3 pillar_1.9.0
## [55] knitr_1.48 statip_0.2.3 boot_1.3-31
## [58] estimability_1.5.1 codetools_0.2-19 glue_1.7.0
## [61] V8_4.4.2 fontLiberation_0.1.0 data.table_1.15.4
## [64] vctrs_0.6.5 cellranger_1.1.0 gtable_0.3.0
## [67] assertthat_0.2.1 datawizard_0.12.2 xfun_0.46
## [70] mime_0.12 coda_0.19-4 modeest_2.4.0
## [73] timeDate_3043.102 iterators_1.0.14 statmod_1.4.36
## [76] ellipsis_0.3.2 TH.data_1.1-0 nlme_3.1-167
## [79] fontquiver_0.2.1 rstan_2.32.6 fBasics_4041.97
## [82] tensorA_0.36.2.1 TMB_1.9.14 rpart_4.1.24
## [85] colorspace_2.0-2 nnet_7.3-20 tidyselect_1.2.1
## [88] processx_3.8.4 timeSeries_4032.109 compiler_4.4.3
## [91] curl_4.3.2 rvest_1.0.2 htmlTable_2.4.0
## [94] SparseM_1.81 xml2_1.3.3 fontBitstreamVera_0.1.1
## [97] posterior_1.6.0 checkmate_2.3.2 scales_1.3.0
## [100] DEoptimR_1.0-10 callr_3.7.6 spatial_7.3-15
## [103] digest_0.6.37 minqa_1.2.4 rmarkdown_2.27
## [106] htmltools_0.5.8.1 pkgconfig_2.0.3 base64enc_0.1-3
## [109] stabledist_0.7-2 dbplyr_2.1.1 fastmap_1.2.0
## [112] rlang_1.1.4 htmlwidgets_1.6.4 shiny_1.9.1
## [115] farver_2.1.0 zoo_1.8-9 jsonlite_1.8.8
## [118] magrittr_2.0.3 Formula_1.2-4 bayesplot_1.8.1
## [121] munsell_0.5.0 gdtools_0.3.7 stable_1.1.6
## [124] plyr_1.8.6 pkgbuild_1.3.1 parallel_4.4.3
## [127] ggrepel_0.9.5 sjmisc_2.8.10 ggeffects_1.7.0
## [130] splines_4.4.3 gridtext_0.1.5 hms_1.1.1
## [133] sjstats_0.19.0 ps_1.7.7 uuid_1.0-3
## [136] markdown_1.13 ggsignif_0.6.3 stats4_4.4.3
## [139] rmutil_1.1.10 rstantools_2.1.1 crul_1.5.0
## [142] reprex_2.0.1 evaluate_1.0.0 RcppParallel_5.1.8
## [145] modelr_0.1.8 nloptr_2.0.0 tzdb_0.2.0
## [148] foreach_1.5.2 httpuv_1.6.5 MatrixModels_0.5-3
## [151] openssl_1.4.6 clue_0.3-65 broom_1.0.6
## [154] xtable_1.8-4 rstatix_0.7.0 later_1.3.0
## [157] viridisLite_0.4.0 ragg_1.2.1 lmerTest_3.1-3
## [160] cluster_2.1.8.1 bridgesampling_1.1-2